OpenVINO Running on LattePanda 3 Delta Single Board Computer (2) - Text Recognition
Introduction
Text recognition is an essential technology that converts printed or handwritten text into editable and searchable electronic formats. In this testing, we will use the LattePanda 3 Delta single board computer to run several text recognition projects on the OpenVINO platform, including Optical Character Recognition (OCR) and speech-to-text technologies.
OCR
OCR technology involves analyzing images or scanned documents to extract printed text and convert it into machine-readable text. With the support of the OpenVINO platform, we successfully deployed the OCR model and conducted a series of tests and evaluations. During the testing, we found that the OCR model based on OpenVINO exhibits high accuracy and fast processing speed, effectively extracting and recognizing text information from images. This provides great convenience and efficiency in document processing, automated data input, and information extraction fields.
Speech-to-Text (speech2text)
Speech-to-text technology converts speech signals into text. Leveraging the speech recognition models and tools provided by the OpenVINO platform, we can transform speech inputs into machine-readable text format. Speech recognition finds wide applications in areas such as speech assistants, speech transcription, and automatic speech recognition. By combining the advantages of the OpenVINO platform, we achieve efficient and accurate speech recognition, providing robust support for real-time speech transcription and voice interactions.
These text recognition projects have extensive applications in practical scenarios. For example, in the field of office automation, text recognition can be used for automated document processing, electronic document conversion, and text extraction tasks. In the domains of intelligent customer service and speech assistants, text recognition enables automated speech transcription and intelligent voice interactions. Additionally, in large-scale data analysis and information extraction, text recognition facilitates rapid processing and analysis of textual data.
209-handwritten-ocr
This project provides a tutorial on how to use OCR for handwritten Japanese and Simplified Chinese characters.
Note: The models used in this project are specifically designed for recognizing handwritten Japanese and Simplified Chinese characters and do not currently support other languages.
Project Repository
Testing Steps
1、Import the necessary libraries and modules, including collections, itertools, pathlib, cv2, matplotlib, and numpy.
2、Set constants and folder paths, including model folder, data folder, and character list folder.
3、Define a Language named tuple containing fields for the model name, character list name, and demo image name.
4、Set the language variable based on the chosen language, either Chinese or Japanese.
5、Retrieve the corresponding file information from the languages dictionary based on the chosen language.
6、Download the model files using the omz_downloader command-line tool from the Open Model Zoo.
7、Load the network and perform inference using the Core class from OpenVINO to load and compile the model.
8、Obtain information about the input and output layers for further processing.
9、Load the image, read it in grayscale, and resize and pad it according to the requirements of the input layer.
10、Visualize the input image and display the processed image.
11、Prepare the character list by downloading the character list file and adding a blank symbol at the beginning of the list.
12、Run the inference by providing the input image and obtain the output results.
13、Process the output data to convert it into a more readable format, select the symbol with the highest probability, and remove consecutive duplicate symbols and blank symbols.
14、Print the output results and display the image with the predicted text.
Test Results
- The system accurately recognizes handwritten Chinese characters with high precision.
- The response speed is fast.
208-optical-character-recognition
This project enables word recognition in images using OCR technology. It utilizes the Horizontal-text-detection-0001 and text-recognition-resnet models together for text detection and recognition.
Project Repository
Testing Steps
1、Import the necessary libraries and modules.
2、Set up the OpenVINO runtime environment and model paths.
3、Download the models:
- Use the Model Downloader from the Open Model Zoo to download the detection and recognition models.
- If these models were previously downloaded, they will not be downloaded again.
4、Convert the downloaded detection model to OpenVINO IR format.
- The detection model is an Intel model and is already in OpenVINO IR format, so no conversion is required.
- The recognition model is a public model and needs to be converted to the OpenVINO IR format using the Model Converter.
5、After successful conversion, use the OpenVINO models for OCR.
Test Results
- The response time is rapid, with processing time for a single image taking less than 5 seconds.
- The recognition accuracy is high.
- The drawback is that it can only output individual words at a time and cannot provide complete sentences.
211-speech-to-text
This project utilizes the QuartzNet 15x5 model for automatic speech recognition (ASR).
Project Repository
Testing Steps
1、Import the necessary libraries and modules and install the required dependencies.
2、Set up various variables, such as model folder path, download folder path, data folder path, precision, model name, etc.
3、Download and convert public models:
a. Use the omz_downloader tool to download the selected model.
b. Use the omz_converter tool to convert the downloaded PyTorch model to the OpenVINO IR format.
4、Audio Processing:
a. Load the audio file.
b. Convert the audio file into a Mel spectrogram.
c. Adjust the Mel spectrogram to fit the format expected by the model's input.
5、Load the model:
a. Create an instance of OpenVINO's Core Engine.
b. Read and load the model.
c. Compile the model for inference.
6、Perform inference:
a. Pass the input to the loaded model and run the inference.
b. Retrieve the model's output.
7、Decode the output:
a. Post-process the model's output to convert it into a more readable text format.
Test Results
- The response time is relatively fast.
- The recognition results are accurate, as seen in the provided audio.
Summary
Text recognition technologies, including OCR and speech-to-text, have been continuously researched and advanced in the fields of computer vision and artificial intelligence. Based on the testing and evaluation of text recognition projects in the OpenVINO platform presented in this report, significant progress has been made in terms of accuracy, speed, and practicality. Below is a partial summary of the relevant findings:
Function | Chinese and Japanese character recognition | English word recognition | Speech-to-text conversion | Chinese and English text recognition |
ACC1~5* | 3 | 5 | 4 | Cannot run on 3 delta, it is recommended to use higher performance SBC |
GPU support | 1 | 1 | 1 | |
Recommended value | 3 | 4 | 4 |
The accuracy is subjectively judged with a scale of 1 to 5, and all mentioned recognition projects exhibit high accuracy. However, please note that these values are relative and for recommendation purposes only.
If you require high-precision OCR recognition, it is recommended to use the "optical-character-recognition" project. It offers higher recognition accuracy and performs well even on small fonts, although it can only recognize individual words. For complete sentence output, additional post-processing is needed. If you have a higher-performance Single Board Computer (SBC), you may also consider trying the "paddle-ocr-webcam" project, as it demonstrates excellent recognition performance for real-time webcam and video stream input.
In addition to the mentioned projects, OpenVINO offers a wealth of other functionalities. You might be interested in exploring the following articles:
OpenVINO Running on LattePanda 3 Delta Single Board Computer (1) - Object Detection
OpenVINO Running on LattePanda 3 Delta Single Board Computer (3) - NLP
OpenVINO Running on LattePanda 3 Delta Single Board Computer (4) - Pose Estimation
OpenVINO Running on LattePanda 3 Delta Single Board Computer (5) - Audio Processing