TOP

Run YOLOv8 on LattePanda Mu (Intel N100 Processor) with OpenVINO

Introduction

This article introduces the use of DFRobot's latest micro x86 computing module, LattePanda Mu, to run YOLOv8 with acceleration by OpenVINO, achieving efficient and accurate object detection while addressing the issues of large size and inconvenience associated with traditional high-performance computers.

 

As object detection technology becomes increasingly widespread across various fields, more industrial and commercial users are beginning to utilize YOLO for real-time detection, object tracking, and other applications. YOLOv8, proposed by Ultralytics in 2023, has been particularly well-received. However, despite its power, YOLOv8 demands high computational resources, often leading to lagging on lightweight computing devices. While high-performance computers can meet YOLO's requirements, they are usually bulky and inconvenient to carry and deploy.

 

Therefore, in this article, we will use DFRobot's latest Mu core board to run YOLOv8 and attempt to accelerate it through OpenVINO. LattePanda Mu is a micro x86 computing module equipped with an Intel N100 quad-core processor, 8GB LPDDR5 memory, and 64GB storage. This module offers rich expansion pins, such as 3 HDMI/DisplayPort, 8 USB 2.0, up to 4 USB 3.2, and up to 9 PCIe 3.0 lanes, combined with open-source carrier board design files, allowing users to easily design or customize carrier boards to meet their unique needs.

 

Unlike other products, this card-sized computing module can be easily embedded into space-constrained devices, delivering robust computing power without taking up much space. Additionally, the Intel N100 processor's TDP (thermal design power) can be adjusted between 6W and 35W, enabling users to flexibly choose between power consumption and performance to meet the needs of different application scenarios.

 

With this module, it is possible to achieve more efficient and accurate object detection in various application scenarios while enjoying the convenience of portable devices.

 

You can find all the codes and model files used in this article at the GitHub link.

 

Native YOLO Deployment

First, we tested running the YOLOv8n model directly on the MU without any quantization acceleration. It can be seen that the running performance is relatively laggy and cannot achieve high-speed, accurate streaming object recognition tasks.

CODE
pip install ultralytics opencv-python numpy

These libraries include:

·ultralytics: A library for quickly setting up and running YOLO models.

·opencv-python: Provides rich image processing functions for handling and displaying images.

·numpy: Used for array and matrix operations, it is a commonly used numerical computation library in Python.

 

After the libraries are installed, download the GitHub files:

CODE
Git clone https://github.com/winster-bai/yolo_on_MU.git
cd yolo_on_MU

Ensure that the Mu is connected to a USB camera, and then you can proceed with the following operations.

 

connecting LattePanda camera to any USB port

Figure 1. Connecting camera to any LattePanda USB port

 

Object Detection (CPU)

1. Download the model yolov8n.pt (included in the GitHub folder)

Official download link: https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt

 

The YOLOv8 object detection model includes 5 variants, all trained on the COCO dataset. The meanings of the model suffixes and their performance are as follows:

·n: Nano (ultra-lightweight)

·s: Small (small-sized)

·m: Medium (medium-sized)

·l: Large (large-sized)

·x: Extra Large (extra-large-sized)

 

Performance comparison of yolov8 object detection models of different sizes

Table 1. Performance comparison of YOLOv8 object detection models of different sizes

 

parameter explanation

Table 2. Parameter Explanations

 

To achieve rapid object recognition, this example utilizes YOLOv8n for object detection. YOLOv8n is the lightest model in the YOLOv8 series, capable of achieving object detection tasks with lower computational resources and faster speeds while maintaining accuracy. It is particularly well-suited for resource-constrained applications.

 

2. Run yolo.py file

CODE
python yolo.py 

Code Explanation:

· Load YOLOv8n Model: Load the pre-trained YOLOv8n model using `YOLO('yolov8n.pt')`.

· Open the Camera: Open the default camera using `cv2.VideoCapture(0)`.

· Read Frames: Read video frames using `cap.read()`.

· Model Prediction: Pass the read frames to the YOLO model for object detection.

· Draw Detection Boxes: Draw rectangular boxes around detected objects' positions and display object labels and confidences.

· Calculate and Display FPS: Calculate the frame rate and display the FPS value on the top left corner of the frame.

· Display Frames: Display the processed frames using `cv2.imshow`.

· Exit the Loop: Exit the loop by pressing the 'Q' key.

· Release Resources: Release the camera and close all OpenCV windows.

 

3. Running Result: 

When using the CPU on the Mu, the frame rate (FPS) for object detection with YOLOv8n ranges approximately between 4 to 7.

 

Running YOLOv8n object detection model on LattePanda Mu x86 computer module

Figure 2. Running YOLOv8n object detection model on LattePanda Mu x86 computer module

 

Object Segmentation (CPU)

1. Download the official model (included in the GitHub folder)

https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-seg.pt

 

Similar to before, the five object segmentation models of YOLOv8 are trained on the COCO dataset, and their performance is as follows:

Performance comparison of yolov8 object segmentation models of different sizes

Table 3. Performance comparison of YOLOv8 object segmentation models of different sizes

 

2. Run yolo_seg.py file

CODE
python yolo_seg.py

Code Explanation:

· Load YOLO Model: Instantiate the YOLOv8n model (yolov8n-seg.pt file) for object segmentation.

· Open the Camera: Open the camera using the VideoCapture function from the OpenCV library.

· Check if the Camera is Opened: Use the isOpened method to check if the camera is successfully opened. If not, output an error message and exit the program.

· Get the Width and Height of Video Frames: Use the get method from the OpenCV library to obtain the width and height of video frames.

· Initialize Timer and FPS: Initialize the timer and frame rate (FPS) using variables prev_time and fps.

· Loop to Read Video Frames and Process: Continuously read video frames captured by the camera using a while loop.

· Pass Frames to the Model for Prediction: Use the YOLO model to perform object detection on each frame, and store the detection results in the variable results.

· Process Prediction Results and Draw on Frames:

· Convert the prediction results to a PIL image, then to a NumPy array.

· Get the bounding boxes, confidences, and class IDs of detections, and iterate to draw bounding boxes and labels.

· Calculate and Draw FPS: Calculate the frames per second (FPS) processed per second, and draw it in the top left corner of the video frame.

· Display Frames and Exit the Loop: Display the processed video frames using the imshow function from OpenCV while listening for keyboard input. If the 'Q' key is pressed, exit the loop.

· Release the Camera and Close the Window: Release the camera resources and close the OpenCV window.

 

3. Running Result: 

When using the CPU on the Mu, the frame rate (FPS) for object segmentation with YOLOv8n ranges approximately between 2 to 5.

 

Running yolov8n object segmentation model on LattePanda MU x86 computer module

Figure 3. Running YOLOv8n object segmentation model on LattePanda Mu x86 computer module

 

Pose Estimation (CPU)

1. Download the official model (included in the GitHub folder).

https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-pose.pt

 

YOLOv8 has a total of 6 pose estimation models, trained on the COCO dataset. The performance of the models is as follows. YOLOv8x-pose-p6 is a variant of YOLOv8x, which uses a larger input size (1280 pixels), providing higher accuracy but requiring more computational resources.

 

Performance comparison of yolov8 pose estimation models of different sizes

Table 4. Performance comparison of YOLOv8 pose estimation models of different sizes

 

2. Run yolo_pos.py file

CODE
python yolo_pos.py

Code Explanation:

· Load YOLO Model: Instantiate the YOLOv8n model (yolov8n-pose.pt file) for human pose detection.

· Open Local Video File: Use the VideoCapture function from OpenCV to open the local video file named "people.mp4". This can be changed to a network camera if desired.

· Get the Width and Height of Video Frames: Use the get method from OpenCV to obtain the width and height of video frames.

· Initialize Timer and FPS: Initialize the timer and frame rate (FPS) using variables prev_time and fps.

· Loop to Read Video Frames and Process: Continuously read each frame from the video file using a while loop.

· Pass Frames to the Model for Prediction: Use the YOLO model to perform human pose detection on each frame, specifying CPU usage explicitly.

· Process Prediction Results and Draw on Frames: Convert the prediction results to a PIL image, then to a NumPy array. Get the bounding boxes, confidences, and class IDs of detections, and iterate to draw bounding boxes and labels.

· Calculate and Draw FPS: Calculate the frames per second (FPS) processed per second, and draw it in the top left corner of the video frame.

· Resize Frames: Use the resize function from OpenCV to adjust the size of the frame to 640x480 pixels.

· Display Frames and Exit the Loop: Display the processed video frames using the imshow function from OpenCV while listening for keyboard input. If the 'Q' key is pressed, exit the loop.

· Release the Video File and Close the Window: Release the resources of the video file and close the OpenCV window.

 

If you want to use a camera instead of a local video file, skip this step and change the code:

CODE
cap = cv2.VideoCapture("people.mp4")

into:

CODE
cap = cv2.VideoCapture(0)

3. Running Result: 

When using the CPU on the Mu, the frame rate (FPS) for human pose detection with YOLOv8n ranges approximately between 3 to 6.

 

Running YOLOv8n pose estimation model on LattePanda Mu x86 computer module

Figure 4. Running YOLOv8n pose estimation model on LattePanda Mu x86 computer module

 

ONNX transform

The YOLOv8n model supports exporting to different formats, such as ONNX, CoreML, etc. The specific exportable formats are shown in the table below. You can use the format parameter, that is,format='onnx' or format='engine'. You can directly predict or verify the exported model, that is, YOLO predict model=yolov8n.onnx. After the export is completed, an example of using the model will be displayed.

 

yolov8 Export format

Figure 5. YOLOv8 export format

 

In this article, the model is exported to .onnx format, which can have more cross-platform compatibility and deployment flexibility. The model conversion code is as follows:

 

CODE
from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.pt")  # load the official model

# Export the model
model.export(format="onnx" , dynamic=True)

Output are as follows:

 Successfully converted the model to ONNX format

Figure 6. Successfully converted the model to ONNX format

 

To test the ONNX model on the Mu (which will automatically install onnxruntime during the first run), under the current test environment and hardware conditions, if the input image size (inference size) is not changed, the speed difference between the .pt model and the ONNX model is not significant, so it will not be further elaborated.

 

Running ONNX model on LattePanda Mu with original image size

Figure 7. Running ONNX model on LattePanda Mu with original image size

 

Using OpenVINO Optimization

After testing the native model, it was found that there is still considerable room for improvement in the inference speed in real-time inference environments. Therefore, we tested the performance of the YOLOv8n model optimized with OpenVINO on the integrated Intel GPU of the Mu. Under the condition that the model accuracy did not significantly decrease, the inference speed was greatly improved.

 

Environment Configuration

1. Install Anaconda 

Visit the official website and download the installation package for your corresponding system version. Follow the step-by-step instructions to confirm the installation completion.

 

2. Download GIT

 

3. Install Microsoft Visual C++ Redistributable

 

4. Create a Conda environment and specify the Python version in Anaconda Prompt.

CODE
conda create -n yolov8 python=3.8

Note: It is recommended to specify Python version 3.8, as version 3.11 may have compatibility issues with the YOLOv8 library and is not conducive to subsequent custom data training.

 

5. Install OpenVINO (using Windows environment as an example)

Clone the notebook repository from GitHub to your local machine.

CODE
git clone --depth=1 https://github.com/openvinotoolkit/openvino_notebooks.git
cd openvino_notebooks

Install OpenVINO environment

CODE
python -m pip install --upgrade pip wheel setuptools
pip install -r requirements.txt

6. Open the YOLOv8 notebook in the terminal

CODE
jupyter lab notebooks/yolov8-optimization

Then you can select the YOLOv8 optimization project in the jupyter notebook and run it locally.


Object Detection (CPU/GPU)

Using YOLOv8n for INT8 Quantization

Explanation of Precision Change Parameters:

· Precision: The accuracy of the model in recognizing relevant objects.

· Recall: Measures the model's ability to detect all ground truth objects.

· mAP@t: Average Precision, represented as the area under the precision-recall curve aggregated for all classes in the dataset, where t is the intersection over union (IOU) threshold, i.e., the degree of overlap between ground truth and predicted objects. Therefore, mAP@.5 represents the average precision calculated at the 0.5 IOU threshold, while mAP@.5:.95 represents the same, but calculated over the range of IOU thresholds from 0.5 to 0.95 with a step size of 0.05.

 

Verification accuracy of yolov8n model

Figure 8. Verification accuracy of YOLOv8n model

 

Accuracy after training NFCC and INT8 quantization

Figure 9. Accuracy after training NFCC and INT8 quantization

 

Running Result

After INT8 quantization, the frame rate (FPS) for object detection with YOLOv8 on the CPU of the Mu ranges approximately between 7 to 9.

 

Running YOLOv8n object detection model on LattePanda Mu CPU with OpenVINO optimization

Figure 10. Running YOLOv8n object detection model on LattePanda Mu CPU with OpenVINO optimization

 

GPU

After INT8 quantization, the frame rate (FPS) for object detection with YOLOv8 on the integrated GPU of the Mu ranges approximately between 15 to 20.

 

Running YOLOv8n object detection model on LattePanda Mu GPU with OpenVINO optimization

Figure 11. Running YOLOv8n object detection model on LattePanda Mu GPU with OpenVINO optimization

 

Object Segmentation (CPU/GPU)

Accuracy comparison after model quantification

Accuracy after training NFCC and INT8 quantization

Figure 12. Accuracy after training NFCC and INT8 quantization

 

CPU

After INT8 quantization, the frame rate (FPS) for object segmentation with YOLOv8 on the integrated GPU of the Mu ranges approximately between 5 to 7.

 

Running YOLOv8n object segmentation model on LattePanda Mu CPU with OpenVINO optimization

Figure 13. Running YOLOv8n object segmentation model on LattePanda Mu CPU with OpenVINO optimization

 

GPU

After INT8 quantization, the frame rate (FPS) for object segmentation with YOLOv8 on the GPU of the Mu ranges approximately between 12 to 14.

 

Running YOLOv8n object segmentation model on LattePanda Mu GPU with OpenVINO optimization

Figure 14. Running YOLOv8n object segmentation model on LattePanda Mu GPU with OpenVINO optimization

 

Pose Estimation (CPU/GPU)

INT8 quantification accuracy comparison

Accuracy after training NFCC and INT8 quantization

Figure 15. Accuracy after training NFCC and INT8 quantization

 

CPU

After int8 quantization, the frame rate (FPS) of running YOLOv8 human key point detection on mu's CPU is approximately between 8 and 9.

Running YOLOv8n pose estimation model on LattePanda Mu CPU with OpenVINO optimization

Figure 16. Running YOLOv8n pose estimation model on LattePanda Mu CPU with OpenVINO optimization

 

GPU

After int8 quantization, the frame rate (FPS) of running yolov8n human body key point detection on mu's GPU is approximately between 19 and 22.

Running YOLOv8n pose estimation model on LattePanda Mu GPU with OpenVINO optimization

Figure 17. Running YOLOv8n pose estimation model on LattePanda Mu GPU with OpenVINO optimization

 

Conclusion

The frame rate(FPS) results of running yolov8n in different ways on the LattePanda Mu core board for object recognition, segmentation and other operations are as follows:

 

The frame rate(FPS) results of running yolov8n in different ways on the LattePanda MU

Table 5: The frame rate(FPS) results of running YOLOv8n in different ways on the LattePanda Mu

 

Recording data screenshots may cause fps to drop. In actual situations, you can add 1~2 to the above range.

 

If this article proves beneficial to you, we invite you to stay tuned for more updates from us on Discord and social media. Coming up next, we'll delve into the testing of YOLOv10 on LattePanda Mu, and we eagerly anticipate your anticipation.

 

References

1. OpenVINO notebook GitHub

2. Ultralytics official website

 

 

 

 

 

 

Related Product