OpenVINO Running on LattePanda 3 Delta Single Board Computer (4) - Pose Estimation
Introduction
Pose estimation of the human body is a critical task in the field of computer vision. By analyzing the pose information of individuals in images or videos, various applications such as human action recognition, behavior analysis, and virtual reality can be achieved. The OpenVINO platform provides robust functionality and tool support for human pose estimation tasks, including deep learning-based pose estimation models and efficient inference engines. These optimized models and engines enable fast and accurate pose estimation across different hardware platforms, offering developers a convenient way to deploy and perform inference.
In the OpenVINO platform, the core of the human pose estimation project lies in the deep learning-based pose estimation models. These models are trained on large-scale pose datasets and are capable of identifying and estimating the positions and poses of human keypoints. By detecting changes and movements in these keypoints, applications can be developed in various fields, such as human action recognition, sports training assistance, and human-computer interaction.
402-pose-estimation-webcam
This project is a demonstration of real-time human pose estimation using OpenVINO. It utilizes the OpenPose human pose estimation model available in the Open Model Zoo. The project allows for real-time estimation of human body poses from a webcam or video file and presents the results with visual overlays on the image. This functionality can be applied in various applications such as motion capture, sports analysis, and human-computer interaction.
Project Repository:
Testing Steps
1、Import necessary libraries and modules, such as cv2, numpy, etc.
2、Download the model: Use the download_file function to download the model files, including .xml and .bin files, from the model_url_dir.
3、Load the model: Initialize the runtime environment using the Core class of OpenVINO Runtime, read the model's network structure and weights files, and compile the model for execution on the specified device.
4、Process the results: Define auxiliary functions for processing network output results, including 2D pooling, non-maximum suppression, and pose decoding. These functions are used to convert the raw results into pose estimation results.
5、Draw pose overlays: Utilize the human pose estimation demo code from the Open Model Zoo to overlay the estimated poses on the image, visualizing the pose estimation. Key points are represented by circles, and limbs are represented by line segments.
6、Main processing function: Define a primary processing function, run_pose_estimation, for running pose estimation on the specified video source (webcam or video file). This function fetches frames from the video source, converts them into the model's input format, performs inference using the compiled model, and finally draws the results on the image and displays them.
7、Run real-time pose estimation: Call the run_pose_estimation function to perform real-time pose estimation. The user can choose to use a webcam as the video input or a video file.
8、Run pose estimation on a video file: If a webcam is not available, the project allows for pose estimation on a video file. The user can specify the path of the video file and optionally skip the first few frames.
Test Results
- Achieves smooth and lag-free performance with an FPS of 20+.
- Utilizes OpenVINO's tools and libraries for visualizing pose estimation results, such as drawing keypoints, skeleton connections, and pose heatmaps.
- Makes efficient use of hardware accelerator's computational capabilities, providing high-performance pose estimation inference.
- Supports multi-target pose estimation on the same screen and exhibits high accuracy in detections.
406-3D-pose-estimation-webcam
This project is a demonstration of 3D multi-person pose estimation. The models used in this demo are based on Lightweight OpenPose and Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB. It can detect up to 18 different keypoint types, including ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, ankles, along with their corresponding 3D coordinates, which can be used to construct a 3D representation of human body poses.
Project Repository:
Testing Steps
1、Install necessary dependencies, including the pythreejs library, using the command "!pip install pythreejs".
2、Import required modules and libraries, such as OpenVINO's runtime library, OpenCV, ipywidgets, etc.
3、Download the models using the omz_downloader command-line tool and convert the chosen models into OpenVINO IR format.
4、Load the models by initializing the inference engine with OpenVINO Runtime, reading the network architecture and model weights from files, compiling the models, and creating inference requests.
5、Perform model inference to obtain output results for the input image, including features, heatmaps, and part affinity fields.
6、Draw 2D pose overlays, defining connections between certain joints and visualizing the pose structure on the resulting image.
7、Main processing function, running the core logic of 3D pose estimation, including fetching frames from the webcam or video file, preprocessing and resizing the images, performing model inference, parsing the inference results, drawing 3D poses, and drawing 2D pose overlays.
Test Results
- CPU performance may be laggy, but when using a GPU, the FPS is around 20, with better video processing capabilities.
- The system can accurately and continuously detect human keypoints and render 3D images.
403-action-recognition-webcam
This project utilizes Action Recognition Models from the Open Model Zoo, consisting of an Encoder and a Decoder. These two models create a sequence-to-sequence (seq2seq) system for recognizing human activities in the Kinetics-400 dataset. The models employ the Video Transformer method and a ResNet34 encoder.
Project Repository:
Testing Steps
- Import necessary libraries and modules.
- Download the required models.
- Load the labels for activity recognition.
- Load the Encoder and Decoder models.
- Implement auxiliary functions for preprocessing and postprocessing.
- Implement AI functions for preprocessing, Encoder inference, Decoder inference, and softmax normalization.
- Implement the main processing function, including video playback, frame preprocessing, Encoder inference, Decoder inference, and result visualization.
- Provide interfaces to run the system on video files and webcam.
Test Results
- Running on the CPU may result in lagginess, but switching to GPU improves the performance and provides a smoother experience.
- Accuracy may vary for predicting complex actions, but the system can accurately predict simple actions such as scratching the head or grabbing the neck.
Summary
By using OpenVINO for pose estimation, we can accurately estimate the joint positions and pose information of the human body from images or videos. The test summaries for the three projects are as follows:
Among the mentioned projects, I personally believe that 406-3D-pose-estimation-webcam has more versatile applications and represents a significant leap forward from traditional pose estimation. It allows capturing poses and replicating them in a 3D space, facilitating further development and applications.
As technology continues to advance and improve, we can expect the OpenVINO team to provide more optimized and excellent pose estimation projects in the future.
In addition to the mentioned projects, OpenVINO offers a wealth of other functionalities. You might be interested in exploring the following articles:
OpenVINO Running on LattePanda 3 Delta Single Board Computer (1) - Object Detection
OpenVINO Running on LattePanda 3 Delta Single Board Computer (2) - Text Recognition
OpenVINO Running on LattePanda 3 Delta Single Board Computer (3) - NLP - LattePanda
OpenVINO Running on LattePanda 3 Delta Single Board Computer (5) - Audio Processing