|
|
@@ -31,17 +31,19 @@ The formula recognition pipeline is designed to solve formula recognition tasks
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
-<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>92.6</td>
|
|
|
-<td>115.126</td>
|
|
|
-<td>3827.25</td>
|
|
|
-<td>470.2M</td>
|
|
|
+<td>PP-LCNet_x1_0_doc_ori</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-LCNet_x1_0_doc_ori_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>99.06</td>
|
|
|
+<td>3.84845</td>
|
|
|
+<td>9.23735</td>
|
|
|
+<td>7</td>
|
|
|
+<td>A document image classification model based on PP-LCNet_x1_0, with four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees.</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
-<b>Note: The accuracy metrics are evaluated on a self-built dataset covering ID cards and documents, with 1000 images. GPU inference time is based on an NVIDIA Tesla T4 machine, with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and FP32 precision.</b>
|
|
|
+<b>Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
|
|
|
-<p><b>Text Image Unwarping Module (Optional):</b></p>
|
|
|
+
|
|
|
+<p><b>Text Image Correction Module (Optional):</b></p>
|
|
|
|
|
|
<table>
|
|
|
<thead>
|
|
|
@@ -54,18 +56,237 @@ The formula recognition pipeline is designed to solve formula recognition tasks
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
-<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Trained Model</a></td>
|
|
|
-<td>0.8821</td>
|
|
|
-<td>0.0823</td>
|
|
|
-<td>40.01</td>
|
|
|
-<td>-</td>
|
|
|
+<td>UVDoc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UVDoc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>0.179</td>
|
|
|
+<td>30.3 M</td>
|
|
|
+<td>High-precision text image correction model</td>
|
|
|
+</tr>
|
|
|
+</tbody>
|
|
|
+</table>
|
|
|
+<b>Note: The accuracy metrics of the model are measured from the <a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet benchmark</a>.</b>
|
|
|
+
|
|
|
+<p><b>Layout Detection Module (Optional):</b></p>
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time (ms)</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PP-DocLayout-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>90.4</td>
|
|
|
+<td>34.5252</td>
|
|
|
+<td>1454.27</td>
|
|
|
+<td>123.76 M</td>
|
|
|
+<td>A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>PP-DocLayout-M</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-M_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>75.2</td>
|
|
|
+<td>15.9</td>
|
|
|
+<td>160.1</td>
|
|
|
+<td>22.578</td>
|
|
|
+<td>A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>PP-DocLayout-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>70.9</td>
|
|
|
+<td>13.8</td>
|
|
|
+<td>46.7</td>
|
|
|
+<td>4.834</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S.</td>
|
|
|
+</tr>
|
|
|
+</tbody>
|
|
|
+</table>
|
|
|
+
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 500 common document-type images of Chinese and English papers, magazines, contracts, books, exams, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+
|
|
|
+> ❗ The above list includes the <b>3 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>11 full models</b>, including several predefined models with different categories. The complete model list is as follows:
|
|
|
+
|
|
|
+<details><summary> 👉 Details of Model List</summary>
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time (ms)</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PicoDet_layout_1x_table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_table_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_table_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>97.5</td>
|
|
|
+<td>12.623</td>
|
|
|
+<td>90.8934</td>
|
|
|
+<td>7.4 M</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset using PicoDet-1x, capable of detecting table regions.</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout table area detection dataset by PaddleOCR, containing 7835 Chinese and English document images with tables. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+
|
|
|
+* <b>3-Class Layout Detection Model, including Table, Image, and Stamp</b>
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time (ms)</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PicoDet-S_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_3cls_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>88.2</td>
|
|
|
+<td>13.5</td>
|
|
|
+<td>45.8</td>
|
|
|
+<td>4.8</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>PicoDet-L_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_3cls_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>89.0</td>
|
|
|
+<td>15.7</td>
|
|
|
+<td>159.8</td>
|
|
|
+<td>22.6</td>
|
|
|
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td>RT-DETR-H_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_3cls_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>95.8</td>
|
|
|
+<td>114.6</td>
|
|
|
+<td>3832.6</td>
|
|
|
+<td>470.1</td>
|
|
|
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 1154 common document images of Chinese and English papers, magazines, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+
|
|
|
+* <b>5-Class English Document Area Detection Model, including Text, Title, Table, Image, and List</b>
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time (ms)</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PicoDet_layout_1x</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>97.8</td>
|
|
|
+<td>13.0</td>
|
|
|
+<td>91.3</td>
|
|
|
+<td>7.4</td>
|
|
|
+<td>A high-efficiency English document layout area localization model trained on the PubLayNet dataset using PicoDet-1x.</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) dataset, containing 11245 English document images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+
|
|
|
+* <b>17-Class Area Detection Model, including 17 common layout categories: Paragraph Title, Image, Text, Number, Abstract, Content, Figure Caption, Formula, Table, Table Caption, References, Document Title, Footnote, Header, Algorithm, Footer, and Stamp</b>
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>mAP(0.5) (%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>CPU Inference Time (ms)</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tbody>
|
|
|
+<tr>
|
|
|
+<td>PicoDet-S_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>87.4</td>
|
|
|
+<td>13.6</td>
|
|
|
+<td>46.2</td>
|
|
|
+<td>4.8</td>
|
|
|
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
|
|
|
+</tr>
|
|
|
+
|
|
|
+<tr>
|
|
|
+<td>PicoDet-L_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>89.0</td>
|
|
|
+<td>17.2</td>
|
|
|
+<td>160.2</td>
|
|
|
+<td>22.6</td>
|
|
|
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
|
|
|
+</tr>
|
|
|
+
|
|
|
+<tr>
|
|
|
+<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>98.3</td>
|
|
|
+<td>115.1</td>
|
|
|
+<td>3827.2</td>
|
|
|
+<td>470.2</td>
|
|
|
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
|
|
|
+</tr>
|
|
|
+</tbody>
|
|
|
+</table>
|
|
|
+
|
|
|
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 892 common document images of Chinese and English papers, magazines, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
|
|
|
+</details>
|
|
|
+
|
|
|
+<p><b>Formula Recognition Module </b></p>
|
|
|
+<table>
|
|
|
+<tr>
|
|
|
+<th>Model</th><th>Model Download Link</th>
|
|
|
+<th>Avg-BLEU(%)</th>
|
|
|
+<th>GPU Inference Time (ms)</th>
|
|
|
+<th>Model Storage Size (M)</th>
|
|
|
+<th>Introduction</th>
|
|
|
+</tr>
|
|
|
+<td>UniMERNet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UniMERNet_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>86.13</td>
|
|
|
+<td>2266.96</td>
|
|
|
+<td>1.4 G</td>
|
|
|
+<td>UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. The model is trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, significantly improving the recognition accuracy of real-world formulas.</td>
|
|
|
+<tr>
|
|
|
+<td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>87.12</td>
|
|
|
+<td>202.25</td>
|
|
|
+<td>167.9 M</td>
|
|
|
+<td rowspan="2">PP-FormulaNet is an advanced formula recognition model developed by the Baidu PaddlePaddle Vision Team. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network. Through parallel masking and model distillation techniques, it significantly improves inference speed while maintaining high recognition accuracy, making it suitable for applications requiring fast inference. The PP-FormulaNet-L version, on the other hand, uses Vary_VIT_B as its backbone network and is trained on a large-scale formula dataset, showing significant improvements in recognizing complex formulas compared to PP-FormulaNet-S.</td>
|
|
|
+</tr>
|
|
|
+<td>PP-FormulaNet-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>92.13</td>
|
|
|
+<td>1976.52</td>
|
|
|
+<td>535.2 M</td>
|
|
|
+<tr>
|
|
|
+<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Training Model</a></td>
|
|
|
+<td>71.63</td>
|
|
|
<td>-</td>
|
|
|
<td>89.7 M</td>
|
|
|
-<td>LaTeX-OCR is a formula recognition algorithm based on a large autoregressive model. By using Hybrid ViT as the backbone network and transformer as the decoder, it significantly improves the accuracy of formula recognition.</td>
|
|
|
+<td>LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. It uses Hybrid ViT as the backbone network and a transformer as the decoder, significantly improving the accuracy of formula recognition.</td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
-<b>Note: The above accuracy metrics are measured from the internally built formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.</b>
|
|
|
+
|
|
|
+
|
|
|
+<b>Note: The above accuracy metrics are measured using an internally built formula recognition test set within PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on machines with Tesla V100 GPUs, with precision type FP32.</b>
|
|
|
|
|
|
## 2. Quick Start
|
|
|
PaddleX supports experiencing the formula recognition pipeline locally using the command line or Python.
|
|
|
@@ -89,15 +310,15 @@ paddlex --pipeline formula_recognition \
|
|
|
--device gpu:0
|
|
|
```
|
|
|
|
|
|
-The relevant parameter descriptions can be referenced from [2.2.2 Python Script Integration](#222-python-script-integration).
|
|
|
+The relevant parameter descriptions can be referenced from [2.2 Integration via Python Script](#22-integration-via-python-script).
|
|
|
|
|
|
After running, the results will be printed to the terminal, as shown below:
|
|
|
|
|
|
```bash
|
|
|
-{'res': {'input_path': 'general_formula_recognition.png', 'model_settings': {'use_doc_preprocessor': False,'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9778407216072083, 'coordinate': [271.257, 648.50824, 1040.2291, 774.8482]}, ...]}, 'formula_res_list': [{'rec_formula': '\\small\\begin{aligned}{p(\\mathbf{x})=c(\\mathbf{u})\\prod_{i}p(x_{i}).}\\\\ \\end{aligned}', 'formula_region_id': 1, 'dt_polys': ([553.0718, 802.0996, 758.75635, 853.093],)}, ...]}}
|
|
|
+{'res': {'input_path': 'general_formula_recognition.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False,'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9778407216072083, 'coordinate': [271.257, 648.50824, 1040.2291, 774.8482]}, ...]}, 'formula_res_list': [{'rec_formula': '\\small\\begin{aligned}{p(\\mathbf{x})=c(\\mathbf{u})\\prod_{i}p(x_{i}).}\\\\ \\end{aligned}', 'formula_region_id': 1, 'dt_polys': ([553.0718, 802.0996, 758.75635, 853.093],)}, ...]}}
|
|
|
```
|
|
|
|
|
|
-The explanation of the running result parameters can refer to the result interpretation in [2.2.2 Integration via Python Script](#222-python脚本方式集成).
|
|
|
+The explanation of the running result parameters can refer to the result interpretation in [2.2 Integration via Python Script](#22-integration-via-python-script).
|
|
|
|
|
|
The visualization results are saved under `save_path`, where the visualization result of formula recognition is as follows:
|
|
|
|
|
|
@@ -197,7 +418,7 @@ In the above Python script, the following steps are executed:
|
|
|
<td>
|
|
|
<ul>
|
|
|
<li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code></li>
|
|
|
- <li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
|
|
|
+ <li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
|
|
|
<li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
|
|
|
</ul>
|
|
|
</td>
|
|
|
@@ -377,6 +598,8 @@ In the above Python script, the following steps are executed:
|
|
|
|
|
|
- `input_path`: `(str)` The input path of the image to be predicted.
|
|
|
|
|
|
+ - `page_index`: `(Union[int, None])` If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `None`
|
|
|
+
|
|
|
- `model_settings`: `(Dict[str, bool])` The model parameters required for the production line configuration.
|
|
|
|
|
|
- `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-production line.
|
|
|
@@ -400,7 +623,7 @@ In the above Python script, the following steps are executed:
|
|
|
- `formula_region_id`: `(int)` The ID number predicted by formula recognition.
|
|
|
- `dt_polys`: `(List[float])` The bounding box coordinates predicted by formula recognition, in the format [x_min, y_min, x_max, y_max], where (x_min, y_min) is the top-left corner and (x_max, y_max) is the bottom-right corner.
|
|
|
|
|
|
-- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format.
|
|
|
+- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format.
|
|
|
- Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_formula_res_img.{your_img_extension}`. If a file is specified, it will be saved directly to that file. (The production line usually contains many result images, so it is not recommended to specify a specific file path directly, otherwise multiple images will be overwritten and only the last one will be retained.)
|
|
|
|
|
|
* In addition, you can also obtain the visualization image with results and the prediction results through attributes, as follows:
|
|
|
@@ -459,7 +682,7 @@ for res in output:
|
|
|
## 3. Development Integration/Deployment
|
|
|
If the formula recognition production line meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
|
|
|
|
|
|
-If you need to integrate the formula recognition production line into your Python project, you can refer to the example code in [2.2 Python Script Method](#22-python脚本方式集成).
|
|
|
+If you need to integrate the formula recognition production line into your Python project, you can refer to the example code in [ 2.2 Integration via Python Script](#22-integration-via-python-script).
|
|
|
|
|
|
In addition, PaddleX also provides three other deployment methods, which are detailed as follows:
|
|
|
|
|
|
@@ -699,7 +922,7 @@ Since the formula recognition pipeline consists of several modules, if the pipel
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>Formulas are missing</td>
|
|
|
- <td>Layout Area Detection Module</td>
|
|
|
+ <td>Layout Detection Module</td>
|
|
|
<td><a href="../../../module_usage/tutorials/ocr_modules/layout_detection.en.md">Link</a></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
@@ -732,13 +955,13 @@ SubModules:
|
|
|
LayoutDetection:
|
|
|
module_name: layout_detection
|
|
|
model_name: PP-DocLayout-L
|
|
|
- model_dir: null # 替换为微调后的版面区域检测模型权重路径
|
|
|
+ model_dir: null # Replace with the fine-tuned layout detection model weights path
|
|
|
...
|
|
|
|
|
|
FormulaRecognition:
|
|
|
module_name: formula_recognition
|
|
|
model_name: PP-FormulaNet-L
|
|
|
- model_dir: null # 替换为微调后的公式识别模型权重路径
|
|
|
+ model_dir: null # Replace with the fine-tuned formula recognition model weights path
|
|
|
batch_size: 5
|
|
|
|
|
|
SubPipelines:
|
|
|
@@ -750,7 +973,7 @@ SubPipelines:
|
|
|
DocOrientationClassify:
|
|
|
module_name: doc_text_orientation
|
|
|
model_name: PP-LCNet_x1_0_doc_ori
|
|
|
- model_dir: null # 替换为微调后的文档图像方向分类模型权重路径
|
|
|
+ model_dir: null # Replace with the fine-tuned document image orientation classification model weights path
|
|
|
batch_size: 1
|
|
|
...
|
|
|
```
|
|
|
@@ -760,7 +983,7 @@ Then, refer to the command-line or Python script methods in [2. Quick Start](#2-
|
|
|
## 5. Multi-Hardware Support
|
|
|
PaddleX supports a variety of mainstream hardware devices, including NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. You can seamlessly switch between different hardware devices by simply modifying the `--device` parameter.
|
|
|
|
|
|
-For example, if you use Ascend NPU for formula recognition pipeline inference, the Python command is:
|
|
|
+For example, if you use Ascend NPU for formula recognition pipeline inference, the CLI command is:
|
|
|
|
|
|
```bash
|
|
|
paddlex --pipeline formula_recognition \
|
|
|
@@ -776,5 +999,6 @@ paddlex --pipeline formula_recognition \
|
|
|
--device npu:0
|
|
|
|
|
|
```
|
|
|
+Of course, you can also specify the hardware device when calling `create_pipeline()` or `predict()` in a Python script.
|
|
|
|
|
|
If you want to use the formula recognition production line on more types of hardware, please refer to the [PaddleX Multi-Hardware Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).
|