OpenVINO Running on Single Board Computer (1) - Object Detection, utilizing YOLO, SSD, SORT, etc.

Introduction

This article will introduce a project of object detection using OpenVINO (Open Visual Inference & Neural Network Optimization) on the Lattepanda 3 Delta platform, along with its application scenarios. Object detection is a critical task in the field of computer vision, involving the accurate identification and localization of multiple objects from images or videos. Within OpenVINO, we can utilize various object detection algorithms and pre-trained models, including well-known ones like YOLO (You Only Look Once), SSD (Single Shot MultiBox Detector), Faster R-CNN (Region-based Convolutional Neural Networks), and more. These algorithms, by combining deep learning techniques with convolutional neural networks and object detection, achieve efficient and accurate object detection, finding important applications in various domains, such as intelligent surveillance, autonomous driving, industrial quality inspection, smart retail, human-computer interaction, and others. For instance, in intelligent surveillance systems, object detection helps in real-time recognition and alerting of anomalous behaviors, while in autonomous driving, it aids in identifying and avoiding obstacles. In industrial quality inspection, object detection assists in detecting product defects, among other use cases.

This article will select several prominent object detection projects from the OpenVINO notebook, including YOLOv8, SSDLite, SORT, etc. Before delving into these projects, let's have a brief understanding of OpenVINO and the single-board computer used in this testing:

OpenVINO Notebook

OpenVINO Notebook is a specialized application of OpenVINO, providing an interactive interface integrated with OpenVINO tools and libraries within the Jupyter Notebook environment. Through OpenVINO Notebook, users can easily access and utilize OpenVINO's functionalities, including model optimization, model conversion, and model inference. It offers a convenient way to explore and apply the capabilities of OpenVINO without the need to write extensive code.

OpenVINO Notebook offers a series of example codes and documentation to assist users in understanding and utilizing various features of OpenVINO. Users can learn how to optimize and perform inference with models using the provided code and explanations in the Notebook and apply these techniques to real-world projects.

Lattepanda - 3 Delta

LattePanda 3 Delta is a compact x86 Windows-based single board computer (SBC) featuring an Intel 11th generation mobile quad-core quad-thread processor N5105, capable of reaching up to 2.9GHz burst frequency. Compared to its predecessor, the CPU performs up to 2 times faster, and the GPU (Intel UHD Graphics) exhibits up to 3 times faster performance. With such remarkable capabilities, this board can smoothly handle 4K HDR videos and even support some resource-intensive gaming applications.

230-YOLOv8-Optimization

YOLO (You Only Look Once) is a deep learning-based object detection algorithm. Compared to traditional object detection methods, YOLO has faster detection speed and higher accuracy. Its core idea is to transform the object detection task into a regression problem by dividing the image into multiple grid cells and predicting bounding boxes and class probabilities for each cell. Each bounding box contains information about the position and size of an object, as well as the probability of the object belonging to different classes.

The network structure of YOLO consists of convolutional neural networks that extract image features through multiple convolutional and pooling layers, followed by regression and classification predictions using fully connected layers. Due to the fact that the entire network only requires a single forward pass to obtain all bounding boxes and class probabilities, YOLO is highly efficient and capable of real-time object detection.

Project Link

https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/230-yolov8-optimization/230-yolov8-optimization.ipynb

Testing Steps

This testing includes the following steps:

1、Download and prepare the dataset (COCO).

2、Validate the original model.

3、Convert the PyTorch model to OpenVINO Intermediate Representation (IR).

4、Validate the converted model.

5、Quantize the model to 8-bit (only for object detection models, as segmentation model conversion failed).

6、Use the OpenVINO model in real-time with a CV video stream.

Due to the large dataset, the loading time for the accuracy section in the notebook is lengthy, and the single-board computer may freeze. To reduce computation time, the evaluation subset size (num_samples parameter) has been modified from 300 to 200. However, in this case, due to the difference in the validation subset, the accuracy may not match the results reported by the model authors. If you need to compare the differences in the original dataset validation, please modify NUM_TEST_SAMPLES = None.

Test Results

In single-image testing, the original model, IR, and INT8 model show similar annotation results.
Comparison of yoloV8n_pytorch original model and OpenVINO IR converted model on 3 delta (NUM_TEST_SAMPLES = 200)

Comparison of FP32 and quantized INT8 models on the dataset (NUM_TEST_SAMPLES = 200)

OpenVINO IR model's image annotation time on 3 Delta: CPU < 2s, GPU < 4s; Video stream detection achieves FPS of 22.2 (initially 2, GPU loading completes within 10 seconds, then FPS increases to 20+).

YOLOv8 exhibits high annotation accuracy but has room for improvement in handling small and multiple objects.

While maintaining a high frame rate, it can accurately detect and locate multiple objects.

YOLOv8 benefits from hardware acceleration, such as using a GPU, significantly improving the model's inference speed (CPU inference FPS is around 16~18).

In addition to YOLOv8, OpenVINO also provides versions for YOLOv7 and YOLOv5 for use. If you require precise object detection for small objects, both YOLOv5 and YOLOv8 are effective choices. However, if you need to deploy the solution on devices that do not support GPUs, YOLOv5 might be a more suitable option. On the other hand, if you prioritize accuracy and have GPU support, YOLOv8 could be a more valuable solution. The following is a comparison of the accuracy of two quantized versions of YOLOv5 on 3 Delta:

218-Vehicle-Detection-and-Recognition

This project utilizes two pre-trained models from the Open Model Zoo: "vehicle-detection-0200" for object detection and "vehicle-attributes-recognition-barrier-0039" for image classification. By using these models, it is possible to detect vehicles in original images and recognize the attributes of the detected vehicles.

Project Link:

https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/218-vehicle-detection-and-recognition

Testing Steps

1、Import Necessary Modules: Import required Python modules and libraries, including OpenVINO Runtime, OpenCV, NumPy, Matplotlib, etc.

2、Download Models: Use the "omz_downloader" command-line tool to download the pre-trained models from the Open Model Zoo. Two models are needed, one for object detection (vehicle-detection-0200) and one for image classification (vehicle-attributes-recognition-barrier-0039). The downloaded models will be converted to OpenVINO Intermediate Representation (OpenVINO IR) format.

3、Load Models: Use OpenVINO Runtime to read the model's network structure and weight files (.xml and .bin) and compile the model to be executable on the specified device (in this case, CPU).

4、Get Model Properties: Obtain the input data shape of the model for later use.

5、Helper Function: Define the auxiliary function "plt_show" for displaying images.

6、Read and Display Test Images: Download test images and read the image data. Since the input shape of the detection model is [1, 3, 256, 256], the images need to be resized and the channel dimension expanded before displaying.

7、Detect Vehicles using the Detection Model: Utilize the detection model for vehicle detection and obtain the detection results, which consist of a set of bounding boxes, each containing the position and confidence information of a detected vehicle.

8、Detection Processing: Based on the output results of the detection model, calculate the actual positions of vehicles in the image and filter out low-confidence results.

9、Recognize Vehicle Attributes: Select a detected vehicle, crop the region containing the vehicle, and use the recognition model for vehicle attribute recognition. The input image needs to be resized, and inference is performed to obtain the prediction results.

10、Recognition Processing: Parse the output results of the recognition model to obtain the vehicle's color and type attributes. Based on the prediction results, determine the attribute with the highest probability as the final result.

11、Integration of Both Models: Combine the detection model and recognition model to implement the overall process of vehicle detection and attribute recognition. Based on the bounding box results from the detection model, draw rectangles and label vehicle attributes on the image.

Test Results

The model files are very small, approximately 4 MB, making it convenient to use even without a GPU.

The detection accuracy is relatively high, and it can quickly identify the color characteristics of all vehicles, even in the presence of multiple vehicles.

401-Object-Detection

This project demonstrates object detection using SSDLite MobileNetV2 from the Open Model Zoo with OpenVINO.

Project Link:

Testing Steps

1、Download Model: Download the model from the specified URL using the "download_file" function and save it to the local file system.

2、Unzip Model: Use the "tarfile" library to unzip the downloaded model file and obtain the path to the model.

3、Convert Model: Convert the pre-trained model in TensorFlow format to OpenVINO Intermediate Representation (IR) format using the "mo.convert_model" function from the Model Optimizer Python API. (There may be instances where the kernel hangs, in which case you can directly place the pre-converted model files in the "model" folder.)

4、Load Model: Initialize OpenVINO Runtime, read the IR model file, and compile the model into an executable model on the specified device using the "compile_model" method.

5、Process Results: Define a function to process the detection results, including converting the normalized coordinates to pixel coordinates, applying non-maximum suppression, and other operations.

6、Draw Bounding Boxes: Define a function to draw bounding boxes, including drawing the boxes and labels.

7、Main Processing Function: Define the main processing function, which includes the main logic for running object detection, such as reading video frames, preprocessing, running inference, post-processing, and drawing bounding boxes.

8、Run Object Detection: Depending on the requirement, choose to use either the camera or a video file as the input source and call the main processing function for object detection.

Test Results

During the conversion process, the kernel may hang, so manual kernel conversion is required, and the files need to be copied back to 3 Delta.

The FPS (frames per second) is around 25 when running on the CPU and can reach 70+ when using the GPU.

The object recognition is relatively accurate, but due to limited data classification, it is more suitable for tracking and recognizing common objects.

407-Person-Tracking-Webcam

This project utilizes the SORT algorithm for object tracking, specifically for tracking people in real-time using a webcam. As it focuses on detecting only human bodies, it is more suitable for specific applications like warehouse management, security surveillance, etc.

Project Link:

Testing Steps

1、Import Necessary Libraries and Modules: Import required libraries and modules, including OpenVINO, NumPy, cv2 (OpenCV), Matplotlib, etc.

2、Download Required Models: Download the necessary models, including the person detection model and person re-identification model, using the "omz_downloader" tool, and specify the model names and paths for downloading.

3、Load Models: Define a generic class for model loading and inference, including steps such as initializing the OpenVINO Runtime, reading the models, and compiling the models.

4、Define Data Processing Functions: This includes data preprocessing and post-processing. The data preprocessing function is used to convert and resize the input data, while the post-processing function extracts relevant information from the model's output and visualizes the results.

5、Test Person Re-Identification Model: This involves loading a test image, preprocessing, inference, and calculating cosine distance to compare the similarity of two individuals.

6、Define the Main Processing Function (run_person_tracking): This function runs the entire process of person tracking. It includes creating a video player, preparing frame images, running AI inference, and visualizing the results.

Test Results

Achieves approximately 40 FPS on 3 Delta.

Fast and accurate recognition speed.

Can accurately distinguish different individuals, making it suitable for various applications, such as security surveillance, with promising potential.

Summary

This testing covered four object detection-related projects from OpenVINO, including YOLO, SSD, SORT, and others. These projects represent various object detection algorithms and are applicable to scenarios such as multi-object detection, vehicle detection, and pedestrian detection. The summarized results are shown in the table below:

Function	General Object Detection	Vehicle Detection	General Object Detection	Person Tracking	Object Detection Segmentation
Webcam FPS(on GPU)	22	none	24	40	Cannot run on 3 delta, it is recommended to use higher performance SBC
ACC1~5*	5	4	3	4
GPU support	1	1	1	1
Recommended value	5	3	4	4

The accuracy is subjectively judged by the author and falls within the range of 1 to 5. All four projects mentioned above have achieved high accuracy in recognition, and these values are relative recommended figures.

During the testing process, it was found that the object detection project based on OpenVINO has achieved remarkable results in terms of both speed and accuracy. Through techniques such as model optimization and hardware acceleration, the inference speed of object detection on Intel integrated graphics has significantly improved, enabling real-time applications. Moreover, the model demonstrates high accuracy in various scenarios, accurately detecting and locating target objects.

With the continuous advancement of hardware technology and algorithm optimization, object detection will play a crucial role in various fields. For instance, in the field of intelligent transportation, object detection will assist autonomous driving systems in achieving more precise environmental perception and obstacle recognition. In the realm of intelligent security, object detection will provide powerful real-time monitoring and intrusion detection capabilities. In sectors like intelligent retail and industrial quality inspection, object detection will contribute to automated and intelligent production and services.

If you also require object detection applications for industrial or production scenarios, you may consider prioritizing the aforementioned OpenVINO projects. With detailed documentation and comprehensively optimized models, they are expected to yield significant productivity gains.

In addition to the mentioned projects, OpenVINO offers a wealth of other functionalities. You might be interested in exploring the following articles:

OpenVINO Running on LattePanda 3 Delta Single Board Computer (2) - Text Recognition

OpenVINO Running on LattePanda 3 Delta Single Board Computer (3) - NLP

OpenVINO Running on LattePanda 3 Delta Single Board Computer (4) - Pose Estimation

OpenVINO Running on LattePanda 3 Delta Single Board Computer (5) - Audio Processing