|
|
@@ -15,48 +15,75 @@ The text recognition module is the core component of an OCR (Optical Character R
|
|
|
<th>Recognition Avg Accuracy(%)</th>
|
|
|
<th>GPU Inference Time (ms)</th>
|
|
|
<th>CPU Inference Time (ms)</th>
|
|
|
-<th>Model Size (M)</th>
|
|
|
-<th>Description</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
+<td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_rec_doc_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec. It has added the ability to recognize some traditional Chinese characters, Japanese, and special characters, and can support the recognition of more than 15,000 characters. In addition to improving the text recognition capability related to documents, it also enhances the general text recognition capability.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Training Model</a></td>
|
|
|
<td>78.20</td>
|
|
|
<td>7.95018</td>
|
|
|
<td>46.7868</td>
|
|
|
<td>10.6 M</td>
|
|
|
-<td rowspan="2">PP-OCRv4, developed by Baidu's PaddlePaddle Vision Team, is the next version of the PP-OCRv3 text recognition model. By introducing data augmentation schemes, GTC-NRTR guidance branches, and other strategies, it further improves text recognition accuracy without compromising model inference speed. The model offers both server and mobile versions to meet industrial needs in different scenarios.</td>
|
|
|
+<td>The PP-OCRv4 recognition model is further upgraded based on PP-OCRv3. Under comparable speed conditions, the effect in Chinese and English scenarios is further improved, and the average recognition accuracy of the 80-language multilingual model is increased by more than 8%.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>PP-OCRv4_server_rec </td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
+<td>PP-OCRv4_server_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">Training Model</a></td>
|
|
|
<td>79.20</td>
|
|
|
<td>7.19439</td>
|
|
|
<td>140.179</td>
|
|
|
<td>71.2 M</td>
|
|
|
+<td>A high-precision server-side text recognition model, featuring high accuracy, fast speed, and multilingual support. It is suitable for text recognition tasks in various scenarios.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/en_PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>The ultra-lightweight English text recognition model released by PaddleOCR in May 2023. It is small in size and fast in speed, and can achieve millisecond-level prediction on CPU. Compared with the PP-OCRv3 English model, the recognition accuracy is improved by 6%, and it is suitable for text recognition tasks in various scenarios.</td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
-<b>Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built Chinese dataset, covering street scenes, web images, documents, handwriting, and more, with 1.1w images for text recognition. GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+<b>Note: The evaluation set for the above accuracy indicators is the Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting, with 11,000 images included in text recognition. All models' GPU inference time is based on NVIDIA Tesla T4 machine, with precision type of FP32. CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type of FP32.</b>
|
|
|
|
|
|
-> ❗ The above list features the <b>2 core models</b> that the image classification module primarily supports. In total, this module supports <b>4 models</b>. The complete list of models is as follows:
|
|
|
+> ❗ The above list features the <b>4 core models</b> that the image classification module primarily supports. In total, this module supports <b>18 models</b>. The complete list of models is as follows:
|
|
|
|
|
|
<details><summary> 👉Model List Details</summary>
|
|
|
|
|
|
+* <b>Chinese Recognition Model</b>
|
|
|
+
|
|
|
<table>
|
|
|
<tr>
|
|
|
<th>Model</th><th>Model Download Link</th>
|
|
|
<th>Recognition Avg Accuracy(%)</th>
|
|
|
<th>GPU Inference Time (ms)</th>
|
|
|
<th>CPU Inference Time (ms)</th>
|
|
|
-<th>Model Size (M)</th>
|
|
|
-<th>Description</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
+<td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv4_server_rec_doc_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec. It has added the recognition capabilities for some traditional Chinese characters, Japanese, and special characters. The number of recognizable characters is over 15,000. In addition to the improvement in document-related text recognition, it also enhances the general text recognition capability.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">Training Model</a></td>
|
|
|
<td>78.20</td>
|
|
|
<td>7.95018</td>
|
|
|
<td>46.7868</td>
|
|
|
<td>10.6 M</td>
|
|
|
-<td rowspan="2">PP-OCRv4, developed by Baidu's PaddlePaddle Vision Team, is the next version of the PP-OCRv3 text recognition model. By introducing data augmentation schemes, GTC-NRTR guidance branches, and other strategies, it further improves text recognition accuracy without compromising model inference speed. The model offers both server and mobile versions to meet industrial needs in different scenarios.</td>
|
|
|
+<td>The PP-OCRv4 recognition model is an upgrade from PP-OCRv3. Under comparable speed conditions, the effect in Chinese and English scenarios is further improved. The average recognition accuracy of the 80 multilingual models is increased by more than 8%.</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>PP-OCRv4_server_rec </td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_server_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
@@ -64,51 +91,186 @@ The text recognition module is the core component of an OCR (Optical Character R
|
|
|
<td>7.19439</td>
|
|
|
<td>140.179</td>
|
|
|
<td>71.2 M</td>
|
|
|
+<td>A high-precision server text recognition model, featuring high accuracy, fast speed, and multilingual support. It is suitable for text recognition tasks in various scenarios.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>An ultra-lightweight OCR model suitable for mobile applications. It adopts an encoder-decoder structure based on Transformer and enhances recognition accuracy and efficiency through techniques such as data augmentation and mixed precision training. The model size is 10.6M, making it suitable for deployment on resource-constrained devices. It can be used in scenarios such as mobile photo translation and business card recognition.</td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
-<p><b>Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built Chinese dataset, covering street scenes, web images, documents, handwriting, and more, with 1.1w images for text recognition. GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b></p>
|
|
|
+<p><b>Note: The evaluation set for the above accuracy indicators is the Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting. The text recognition includes 11,000 images. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.</b></p>
|
|
|
+
|
|
|
<table>
|
|
|
<tr>
|
|
|
<th>Model</th><th>Model Download Link</th>
|
|
|
<th>Recognition Avg Accuracy(%)</th>
|
|
|
<th>GPU Inference Time (ms)</th>
|
|
|
<th>CPU Inference Time</th>
|
|
|
-<th>Model Size (M)</th>
|
|
|
-<th>Description</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>ch_SVTRv2_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/ch_SVTRv2_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_SVTRv2_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
+<td>ch_SVTRv2_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/ch_SVTRv2_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_SVTRv2_rec_pretrained.pdparams">Training Model</a></td>
|
|
|
<td>68.81</td>
|
|
|
<td>8.36801</td>
|
|
|
<td>165.706</td>
|
|
|
<td>73.9 M</td>
|
|
|
-<td rowspan="1">SVTRv2, a server-side text recognition model developed by the OpenOCR team at the Vision and Learning Lab (FVL) of Fudan University, also won first place in the OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge. Its A-rank end-to-end recognition accuracy is 6% higher than PP-OCRv4.
|
|
|
+<td rowspan="1">
|
|
|
+SVTRv2 is a server text recognition model developed by the OpenOCR team of Fudan University's Visual and Learning Laboratory (FVL). It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the A list is 6% higher than that of PP-OCRv4.
|
|
|
</td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
-<p><b>Note: The evaluation set for the above accuracy metrics is the <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge - Track 1</a> A-rank. GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b></p>
|
|
|
+<p><b>Note: The evaluation set for the above accuracy indicators is the <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR Algorithm Model Challenge</a> - Task One: OCR End-to-End Recognition Task A list. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.</b></p>
|
|
|
<table>
|
|
|
<tr>
|
|
|
<th>Model</th><th>Model Download Link</th>
|
|
|
<th>Recognition Avg Accuracy(%)</th>
|
|
|
<th>GPU Inference Time (ms)</th>
|
|
|
<th>CPU Inference Time</th>
|
|
|
-<th>Model Size (M)</th>
|
|
|
-<th>Description</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
-<td>ch_RepSVTR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/ch_RepSVTR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_RepSVTR_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
+<td>ch_RepSVTR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/ch_RepSVTR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ch_RepSVTR_rec_pretrained.pdparams">Training Model</a></td>
|
|
|
<td>65.07</td>
|
|
|
<td>10.5047</td>
|
|
|
<td>51.5647</td>
|
|
|
<td>22.1 M</td>
|
|
|
-<td rowspan="1"> RepSVTR, a mobile text recognition model based on SVTRv2, won first place in the OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge. Its B-rank end-to-end recognition accuracy is 2.5% higher than PP-OCRv4, with comparable inference speed.</td>
|
|
|
+<td rowspan="1"> The RepSVTR text recognition model is a mobile text recognition model based on SVTRv2. It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the B list is 2.5% higher than that of PP-OCRv4, with the same inference speed.</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+<p><b>Note: The evaluation set for the above accuracy indicators is the <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">PaddleOCR Algorithm Model Challenge</a> - Task One: OCR End-to-End Recognition Task B list. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.</b></p>
|
|
|
+
|
|
|
+* <b>English Recognition Model</b>
|
|
|
+
|
|
|
+<table>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>Recognition Avg Accuracy(%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/en_PP-OCRv4_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>[Latest] Further upgraded based on PP-OCRv3, with improved accuracy under comparable speed conditions.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>en_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/en_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Ultra-lightweight model, supporting English and numeric recognition.</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+* <b>Multilingual Recognition Model</b>
|
|
|
+
|
|
|
+<table>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>Recognition Avg Accuracy(%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>korean_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/korean_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Korean Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>japan_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/japan_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Japanese Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>chinese_cht_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/chinese_cht_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Traditional Chinese Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>te_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/te_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Telugu Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>ka_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/ka_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Kannada Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>ta_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/ta_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Tamil Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>latin_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/latin_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Latin Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>arabic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/arabic_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Arabic Script Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>cyrillic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/cyrillic_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Cyrillic Script Recognition</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>devanagari_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/devanagari_PP-OCRv3_mobile_rec_infer.tar">Inference Model</a>/<a href="">Training Model</a></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td></td>
|
|
|
+<td>Devanagari Script Recognition</td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
-<p><b>Note: The evaluation set for the above accuracy metrics is the <a href="https://aistudio.baidu.com/competition/detail/1131/0/introduction">OCR End-to-End Recognition Task of the PaddleOCR Algorithm Model Challenge - Track 1</a> B-rank. GPU inference time for all models is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b></p></details>
|
|
|
+</details>
|
|
|
|
|
|
## III. Quick Integration
|
|
|
Before quick integration, you need to install the PaddleX wheel package. For the installation method, please refer to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md). After installing the wheel package, a few lines of code can complete the inference of the text recognition module. You can switch models under this module freely, and you can also integrate the model inference of the text recognition module into your project.
|
|
|
@@ -126,6 +288,146 @@ for res in output:
|
|
|
```
|
|
|
For more information on using PaddleX's single-model inference APIs, please refer to the [PaddleX Single-Model Python Script Usage Instructions](../../instructions/model_python_API.en.md).
|
|
|
|
|
|
+After running, the result obtained is:
|
|
|
+```bash
|
|
|
+{'input_path': 'general_ocr_rec_001.png', 'rec_text': 'Oasis Shigewei Garden Apartment', 'rec_score': 0.9875298738479614}
|
|
|
+````
|
|
|
+
|
|
|
+The visualized image is as follows:
|
|
|
+
|
|
|
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/text_recog/general_ocr_rec_001.png">
|
|
|
+
|
|
|
+In the above Python script, the following steps are executed:
|
|
|
+* `create_model` instantiates the text recognition model (here, `PP-OCRv4_mobile_rec` is taken as an example)
|
|
|
+* The `predict` method of the text recognition model is called for inference prediction. The parameter of the `predict` method is `x`, which is used to input the data to be predicted. It supports multiple input types, and the specific instructions are as follows:
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Parameter</th>
|
|
|
+<th>Parameter Description</th>
|
|
|
+<th>Parameter Type</th>
|
|
|
+<th>Options</th>
|
|
|
+<th>Default Value</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tr>
|
|
|
+<td><code>x</code></td>
|
|
|
+<td>Data to be predicted, supporting multiple input types</td>
|
|
|
+<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
|
|
|
+<td>
|
|
|
+<ul>
|
|
|
+ <li><b>Python variable</b>, such as image data represented by <code>numpy.ndarray</code></li>
|
|
|
+ <li><b>File path</b>, such as the local path of an image file: <code>/root/data/img.jpg</code></li>
|
|
|
+ <li><b>URL link</b>, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png">Example</a></li>
|
|
|
+ <li><b>Local directory</b>, this directory should contain the data files to be predicted, such as the local path: <code>/root/data/</code></li>
|
|
|
+ <li><b>Dictionary</b>, the <code>key</code> of the dictionary should correspond to the specific task, such as <code>"img"</code> for image classification tasks. The <code>value</code> of the dictionary supports the above types of data, for example: <code>{"img": "/root/data1"}</code></li>
|
|
|
+ <li><b>List</b>, the elements of the list should be the above types of data, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>, <code>[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]</code></li>
|
|
|
+</ul>
|
|
|
+</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>module_name</code></td>
|
|
|
+<td>Name of the single-function module</td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td><code>text_recognition</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>model_name</code></td>
|
|
|
+<td>Name of the model</td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td><code>PP-OCRv4_mobile_rec</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>model_dir</code></td>
|
|
|
+<td>Path where the model is stored</td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td><code>null</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>batch_size</code></td>
|
|
|
+<td>Batch size</td>
|
|
|
+<td><code>int</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td>1</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>score_thresh</code></td>
|
|
|
+<td>Score threshold</td>
|
|
|
+<td><code>int</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td><code>0</code></td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+* Process the prediction results. The prediction result for each sample is of `dict` type, and supports operations such as printing, saving as an image, and saving as a `json` file:
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Method</th>
|
|
|
+<th>Method Description</th>
|
|
|
+<th>Parameter</th>
|
|
|
+<th>Parameter Type</th>
|
|
|
+<th>Parameter Description</th>
|
|
|
+<th>Default Value</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tr>
|
|
|
+<td rowspan = "3"><code>print</code></td>
|
|
|
+<td rowspan = "3">Print the result to the terminal</td>
|
|
|
+<td><code>format_json</code></td>
|
|
|
+<td><code>bool</code></td>
|
|
|
+<td>Whether to format the output content using <code>json</code> indentation</td>
|
|
|
+<td><code>True</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>indent</code></td>
|
|
|
+<td><code>int</code></td>
|
|
|
+<td>JSON formatting setting, only effective when <code>format_json</code> is <code>True</code></td>
|
|
|
+<td>4</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>ensure_ascii</code></td>
|
|
|
+<td><code>bool</code></td>
|
|
|
+<td>JSON formatting setting, only effective when <code>format_json</code> is <code>True</code></td>
|
|
|
+<td><code>False</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td rowspan = "3"><code>save_to_json</code></td>
|
|
|
+<td rowspan = "3">Save the result as a JSON file</td>
|
|
|
+<td><code>save_path</code></td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>The path where the file is saved. If it is a directory, the saved file name is consistent with the input file name</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>indent</code></td>
|
|
|
+<td><code>int</code></td>
|
|
|
+<td>JSON formatting setting</td>
|
|
|
+<td>4</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>ensure_ascii</code></td>
|
|
|
+<td><code>bool</code></td>
|
|
|
+<td>JSON formatting setting</td>
|
|
|
+<td><code>False</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>save_to_img</code></td>
|
|
|
+<td>Save the result as an image file</td>
|
|
|
+<td><code>save_path</code></td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>The path where the file is saved. If it is a directory, the saved file name is consistent with the input file name</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+
|
|
|
## IV. Custom Development
|
|
|
If you are seeking higher accuracy from existing models, you can use PaddleX's custom development capabilities to develop better text recognition models. Before using PaddleX to develop text recognition models, please ensure that you have installed the relevant model training plugins for OCR in PaddleX. The installation process can be found in the custom development section of the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
|
|
|
|