Ver código fonte

Repair doc3 (#3248)

* fix doc

* fix english doc

* remove duplicate tr

* fix doc
liuhongen1234567 9 meses atrás
pai
commit
eebfc00df0

+ 65 - 65
docs/module_usage/tutorials/ocr_modules/formula_recognition.en.md

@@ -13,51 +13,39 @@ The formula recognition module is a crucial component of OCR (Optical Character
 <table>
 <tr>
 <th>Model</th><th>Model Download Link</th>
-<th>Avg-BLEU</th>
+<th>Avg-BLEU(%)</th>
 <th>GPU Inference Time (ms)</th>
 <th>Model Storage Size (M)</th>
 <th>Introduction</th>
 </tr>
 <td>UniMERNet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UniMERNet_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">Training Model</a></td>
-<td>0.8613</td>
+<td>86.13</td>
 <td>2266.96</td>
 <td>1.4 G</td>
 <td>UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. The model is trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, significantly improving the recognition accuracy of real-world formulas.</td>
 <tr>
 <td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">Training Model</a></td>
-<td>0.8712</td>
+<td>87.12</td>
 <td>202.25</td>
 <td>167.9 M</td>
 <td rowspan="2">PP-FormulaNet is an advanced formula recognition model developed by the Baidu PaddlePaddle Vision Team. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network. Through parallel masking and model distillation techniques, it significantly improves inference speed while maintaining high recognition accuracy, making it suitable for applications requiring fast inference. The PP-FormulaNet-L version, on the other hand, uses Vary_VIT_B as its backbone network and is trained on a large-scale formula dataset, showing significant improvements in recognizing complex formulas compared to PP-FormulaNet-S.</td>
 </tr>
 <td>PP-FormulaNet-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">Training Model</a></td>
-<td>0.9213</td>
+<td>92.13</td>
 <td>1976.52</td>
 <td>535.2 M</td>
-</table>
-
-<b>Note: The above accuracy metrics are measured from the internal formula recognition test set of PaddleX. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.</b>
-
-<table>
-<tr>
-<th>Model</th><th>Model Download Link</th>
-<th>BLEU Score</th>
-<th>Normed Edit Distance</th>
-<th>ExpRate (%)</th>
-<th>Model Storage Size (M)</th>
-<th>Introduction</th>
-</tr>
 <tr>
 <td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Training Model</a></td>
-<td>0.8821</td>
-<td>0.0823</td>
-<td>40.01</td>
+<td>71.63</td>
+<td>-</td>
 <td>89.7 M</td>
 <td>LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. It uses Hybrid ViT as the backbone network and a transformer as the decoder, significantly improving the accuracy of formula recognition.</td>
 </tr>
 </table>
 
-<b>Note: The above accuracy metrics are measured from the LaTeX-OCR formula recognition test set.</b>
+
+
+<b>Note: The above accuracy metrics are measured using an internally built formula recognition test set within PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on machines with Tesla V100 GPUs, with precision type FP32.</b>
 
 ## III. Quick Integration
 > ❗ Before quick integration, please install the PaddleX wheel package. For details, please refer to the [PaddleX Local Installation Guide](../../../installation/installation.md)
@@ -66,7 +54,6 @@ After installing the wheel package, you can complete the inference of the formul
 
 ```python
 from paddlex import create_model
-
 model = create_model(model_name="PP-FormulaNet-S")
 output = model.predict(input="general_formula_rec_001.png", batch_size=1)
 for res in output:
@@ -78,18 +65,19 @@ for res in output:
 After running, the result obtained is:
 
 ````
-{'res': {'input_path': 'general_formula_rec_001.png', 'rec_formula': '\\zeta_{0}(\\nu)=-{\\frac{\\nu\\varrho^{-2\\nu}}{\\pi}}\\int_{\\mu}^{\\infty}d\\omega\\int_{C_{+}}d z{\\frac{2z^{2}}{(z^{2}+\\omega^{2})^{\\nu+1}}}\\ \\ {vec\\Psi}(\\omega;z)e^{i\\epsilon z}\\quad,'}}
+{'res': {'input_path': 'general_formula_rec_001.png', 'page_index': None, 'rec_formula': '\\zeta_{0}(\\nu)=-{\\frac{\\nu\\varrho^{-2\\nu}}{\\pi}}\\int_{\\mu}^{\\infty}d\\omega\\int_{C_{+}}d z{\\frac{2z^{2}}{(z^{2}+\\omega^{2})^{\\nu+1}}}\\ \\ {vec\\Psi}(\\omega;z)e^{i\\epsilon z}\\quad,'}}
 ````
 
 The meanings of the running results parameters are as follows:
 - `input_path`: Indicates the path to the input image of the formula to be predicted.
+- `page_index`:If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `None`
 - `rec_formula`: Indicates the predicted LaTeX source code of the formula image.
 
 The visualization image is as follows:
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/formula_recog/general_formula_rec_001_res.png">
 
-<b>Note: If you need to visualize the formula recognition pipeline, you need to run the following commands to install the LaTeX rendering environment:</b>
+<b>Note: If you need to visualize the formula recognition pipeline, you need to run the following commands to install the LaTeX rendering environment. Currently, the formula recognition visualization pipeline only supports the Ubuntu environment; other environments are not supported at this time. For complex formulas, the LaTeX results may include some advanced representations that might not be displayed successfully in environments like Markdown:</b>
 ```bash
 sudo apt-get update
 sudo apt-get install texlive texlive-latex-base texlive-latex-extra -y
@@ -126,7 +114,7 @@ The explanations for the methods, parameters, etc., are as follows:
 
 * The `model_name` must be specified. After specifying `model_name`, the default model parameters built into PaddleX are used. If `model_dir` is specified, the user-defined model is used.
 
-* The `predict()` method of the text recognition model is called for inference prediction. The `predict()` method has parameters `input` and `batch_size`, which are explained as follows:
+* The `predict()` method of the formula recognition model is called for inference prediction. The `predict()` method has parameters `input` and `batch_size`, which are explained as follows:
 
 <table>
 <thead>
@@ -141,14 +129,13 @@ The explanations for the methods, parameters, etc., are as follows:
 <tr>
 <td><code>input</code></td>
 <td>Data to be predicted, supporting multiple input types</td>
-<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
+<td><code>Python Var</code>/<code>str</code>/<code>list</code></td>
 <td>
 <ul>
   <li><b>Python variable</b>, such as image data represented by <code>numpy.ndarray</code></li>
   <li><b>File path</b>, such as the local path of an image file: <code>/root/data/img.jpg</code></li>
   <li><b>URL link</b>, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png">Example</a></li>
   <li><b>Local directory</b>, the directory should contain data files to be predicted, such as the local path: <code>/root/data/</code></li>
-  <li><b>Dictionary</b>, the <code>key</code> of the dictionary must correspond to the specific task, such as <code>"img"</code> for image classification tasks. The <code>value</code> of the dictionary supports the above types of data, for example: <code>{"img": "/root/data1"}</code></li>
   <li><b>List</b>, elements of the list must be of the above types of data, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>, <code>[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]</code></li>
 </ul>
 </td>
@@ -273,50 +260,54 @@ After executing the above command, PaddleX will validate the dataset and summari
 <details><summary>👉 <b>Details of Validation Results (Click to Expand)</b></summary>
 
 <p>The specific content of the validation result file is:</p>
-<pre><code class="language-bash">{
-  &quot;done_flag&quot;: true,
-  &quot;check_pass&quot;: true,
-  &quot;attributes&quot;: {
-    &quot;train_samples&quot;: 9452,
-    &quot;train_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0109284.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0217434.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0166758.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0022294.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/val_0071799.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0017043.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0026204.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0209202.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/val_0157332.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0232582.png&quot;
+
+<pre><code class="language-bash">
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "train_samples": 10001,
+    "train_sample_paths": [
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0077809.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0161600.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0002077.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0178425.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0010959.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0079266.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0142495.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0196376.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0185513.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0217146.png"
     ],
-    &quot;val_samples&quot;: 1050,
-    &quot;val_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0070221.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0157901.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0085392.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0196480.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0096180.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0136149.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0143310.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0004560.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0115191.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0015323.png&quot;
+    "val_samples": 501,
+    "val_sample_paths": [
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0053264.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0100521.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0146333.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0072788.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0002022.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0203664.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0082217.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0208199.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0111236.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0204453.png"
     ]
   },
-  &quot;analysis&quot;: {
-    &quot;histogram&quot;: &quot;check_dataset/histogram.png&quot;
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
   },
-  &quot;dataset_path&quot;: &quot;./dataset/ocr_rec_latexocr_dataset_example&quot;,
-  &quot;show_type&quot;: &quot;image&quot;,
-  &quot;dataset_type&quot;: &quot;FormulaRecDataset&quot;
+  "dataset_path": "ocr_rec_latexocr_dataset_example",
+  "show_type": "image",
+  "dataset_type": "FormulaRecDataset"
 }
 </code></pre>
 <p>In the above validation results, <code>check_pass</code> being True indicates that the dataset format meets the requirements. Explanations for other indicators are as follows:
-* <code>attributes.train_samples</code>: The number of training samples in this dataset is 9452;
-* <code>attributes.val_samples</code>: The number of validation samples in this dataset is 1050;
-* <code>attributes.train_sample_paths</code>: A list of relative paths to the visualized training samples in this dataset;
-* <code>attributes.val_sample_paths</code>: A list of relative paths to the visualized validation samples in this dataset;</p>
+<ul>
+<li><code>attributes.train_samples</code>: The number of training samples in this dataset is 9452;</li>
+<li><code>attributes.val_samples</code>: The number of validation samples in this dataset is 1050;</li>
+<li><code>attributes.train_sample_paths</code>: A list of relative paths to the visualized training samples in this dataset;</li>
+<li><code>attributes.val_sample_paths</code>: A list of relative paths to the visualized validation samples in this dataset;</li>
+</ul>
 <p>Additionally, the dataset verification also analyzes the distribution of sample numbers across all categories in the dataset and generates a distribution histogram (<code>histogram.png</code>):
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/data_prepare/formula_recognition/01.jpg"></p></details>
 
@@ -393,7 +384,7 @@ CheckDataset:
 Model training can be completed with a single command, taking the training of the formula recognition model PP-FormulaNet-S as an example:
 
 ```bash
-python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml  \
+FLAGS_json_format_model=1 python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml  \
     -o Global.mode=train \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
@@ -403,6 +394,15 @@ The following steps are required:
 * Set the mode to model training: `-o Global.mode=train`
 * Specify the path to the training dataset: `-o Global.dataset_dir`.
 Other related parameters can be set by modifying the `Global` and `Train` fields in the `.yaml` configuration file, or adjusted by appending parameters in the command line. For example, to specify training on the first two GPUs: `-o Global.device=gpu:0,1`; to set the number of training epochs to 10: `-o Train.epochs_iters=10`. For more modifiable parameters and their detailed explanations, refer to the configuration file instructions for the corresponding task module of the model [PaddleX Common Configuration File Parameters](../../instructions/config_parameters_common.en.md).
+*  Except for LaTeX_OCR_rec, the formula recognition models only support exporting models in JSON format. Therefore, during training, you need to set the parameter `FLAGS_json_format_model=1`.
+*  For the PP-FormulaNet-S, PP-FormulaNet-L, and UniMERNet models, additional Linux packages need to be installed during training. The specific command is as follows:
+
+```bash
+sudo apt-get update
+sudo apt-get install libmagickwand-dev
+python -m pip install Wand
+```
+
 
 <details><summary>👉 <b>More Details (Click to Expand)</b></summary>
 

+ 45 - 43
docs/module_usage/tutorials/ocr_modules/formula_recognition.md

@@ -12,31 +12,31 @@ comments: true
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
-<th>Avg-BLEU</th>
+<th>Avg-BLEU(%)</th>
 <th>GPU推理耗时 (ms)</th>
 <th>模型存储大小 (M)</th>
 <th>介绍</th>
 </tr>
 <td>UniMERNet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UniMERNet_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">训练模型</a></td>
-<td>0.8613</td>
+<td>86.13</td>
 <td>2266.96</td>
 <td>1.4 G</td>
 <td>UniMERNet是由上海AI Lab研发的一款公式识别模型。该模型采用Donut Swin作为编码器,MBartDecoder作为解码器,并通过在包含简单公式、复杂公式、扫描捕捉公式和手写公式在内的一百万数据集上进行训练,大幅提升了模型对真实场景公式的识别准确率</td>
 <tr>
 <td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-S_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">训练模型</a></td>
-<td>0.8712</td>
+<td>87.12</td>
 <td>202.25</td>
 <td>167.9 M</td>
 <td rowspan="2">PP-FormulaNet 是由百度飞桨视觉团队开发的一款先进的公式识别模型,支持5万个常见LateX源码词汇的识别。PP-FormulaNet-S 版本采用了 PP-HGNetV2-B4 作为其骨干网络,通过并行掩码和模型蒸馏等技术,大幅提升了模型的推理速度,同时保持了较高的识别精度,适用于简单印刷公式、跨行简单印刷公式等场景。而 PP-FormulaNet-L 版本则基于 Vary_VIT_B 作为骨干网络,并在大规模公式数据集上进行了深入训练,在复杂公式的识别方面,相较于PP-FormulaNet-S表现出显著的提升,适用于简单印刷公式、复杂印刷公式、手写公式等场景。 </td>
 
 </tr>
 <td>PP-FormulaNet-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">训练模型</a></td>
-<td>0.9213</td>
+<td>92.13</td>
 <td>1976.52</td>
 <td>535.2 M</td>
 <tr>
 <td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">训练模型</a></td>
-<td>0.7163</td>
+<td>71.63</td>
 <td>-</td>
 <td>89.7 M</td>
 <td>LaTeX-OCR是一种基于自回归大模型的公式识别算法,通过采用 Hybrid ViT 作为骨干网络,transformer作为解码器,显著提升了公式识别的准确性。</td>
@@ -53,7 +53,6 @@ wheel 包的安装后,几行代码即可完成公式识别模块的推理,
 
 ```python
 from paddlex import create_model
-
 model = create_model(model_name="PP-FormulaNet-S")
 output = model.predict(input="general_formula_rec_001.png", batch_size=1)
 for res in output:
@@ -63,10 +62,11 @@ for res in output:
 ```
 运行后,得到的结果为:
 ```bash
-{'res': {'input_path': 'general_formula_rec_001.png', 'rec_formula': '\\zeta_{0}(\\nu)=-{\\frac{\\nu\\varrho^{-2\\nu}}{\\pi}}\\int_{\\mu}^{\\infty}d\\omega\\int_{C_{+}}d z{\\frac{2z^{2}}{(z^{2}+\\omega^{2})^{\\nu+1}}}\\ \\ {vec\\Psi}(\\omega;z)e^{i\\epsilon z}\\quad,'}}
+{'res': {'input_path': 'general_formula_rec_001.png', 'page_index': None, 'rec_formula': '\\zeta_{0}(\\nu)=-{\\frac{\\nu\\varrho^{-2\\nu}}{\\pi}}\\int_{\\mu}^{\\infty}d\\omega\\int_{C_{+}}d z{\\frac{2z^{2}}{(z^{2}+\\omega^{2})^{\\nu+1}}}\\ \\ {vec\\Psi}(\\omega;z)e^{i\\epsilon z}\\quad,'}}
 ```
 运行结果参数含义如下:
 - `input_path`:表示输入待预测公式图像的路径
+- `page_index`:如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `rec_formula`:表示公式图像的预测LaTeX源码
 
 
@@ -111,7 +111,7 @@ sudo apt-get install texlive texlive-latex-base texlive-latex-extra -y
 
 * 其中,`model_name` 必须指定,指定 `model_name` 后,默认使用 PaddleX 内置的模型参数,在此基础上,指定 `model_dir` 时,使用用户自定义的模型。
 
-* 调用文本识别模型的 `predict()` 方法进行推理预测,`predict()` 方法参数有 `input` 和 `batch_size`,具体说明如下:
+* 调用公式识别模型的 `predict()` 方法进行推理预测,`predict()` 方法参数有 `input` 和 `batch_size`,具体说明如下:
 
 <table>
 <thead>
@@ -126,14 +126,13 @@ sudo apt-get install texlive texlive-latex-base texlive-latex-extra -y
 <tr>
 <td><code>input</code></td>
 <td>待预测数据,支持多种输入类型</td>
-<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
+<td><code>Python Var</code>/<code>str</code>/<code>list</code></td>
 <td>
 <ul>
   <li><b>Python变量</b>,如<code>numpy.ndarray</code>表示的图像数据</li>
   <li><b>文件路径</b>,如图像文件的本地路径:<code>/root/data/img.jpg</code></li>
   <li><b>URL链接</b>,如图像文件的网络URL:<a href = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png">示例</a></li>
   <li><b>本地目录</b>,该目录下需包含待预测数据文件,如本地路径:<code>/root/data/</code></li>
-  <li><b>字典</b>,字典的<code>key</code>需与具体任务对应,如图像分类任务对应<code>\"img\"</code>,字典的<code>val</code>支持上述类型数据,例如:<code>{\"img\": \"/root/data1\"}</code></li>
   <li><b>列表</b>,列表元素需为上述类型数据,如<code>[numpy.ndarray, numpy.ndarray]</code>,<code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>,<code>[\"/root/data1\", \"/root/data2\"]</code>,<code>[{\"img\": \"/root/data1\"}, {\"img\": \"/root/data2/img.jpg\"}]</code></li>
 </ul>
 </td>
@@ -260,45 +259,48 @@ python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.ya
 <details><summary>👉 <b>校验结果详情(点击展开)</b></summary>
 
 <p>校验结果文件具体内容为:</p>
-<pre><code class="language-bash">{
-  &quot;done_flag&quot;: true,
-  &quot;check_pass&quot;: true,
-  &quot;attributes&quot;: {
-    &quot;train_samples&quot;: 9452,
-    &quot;train_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0109284.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0217434.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0166758.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0022294.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/val_0071799.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0017043.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0026204.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0209202.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/val_0157332.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0232582.png&quot;
+
+<pre><code class="language-bash">
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "train_samples": 10001,
+    "train_sample_paths": [
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0077809.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0161600.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0002077.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0178425.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0010959.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0079266.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0142495.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0196376.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0185513.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/train_0217146.png"
     ],
-    &quot;val_samples&quot;: 1050,
-    &quot;val_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0070221.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0157901.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0085392.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0196480.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0096180.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0136149.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0143310.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0004560.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0115191.png&quot;,
-      &quot;../dataset/ocr_rec_latexocr_dataset_example/images/train_0015323.png&quot;
+    "val_samples": 501,
+    "val_sample_paths": [
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0053264.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0100521.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0146333.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0072788.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0002022.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0203664.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0082217.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0208199.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0111236.png",
+      "..\/dataset\/ocr_rec_latexocr_dataset_example\/images\/val_0204453.png"
     ]
   },
-  &quot;analysis&quot;: {
-    &quot;histogram&quot;: &quot;check_dataset/histogram.png&quot;
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
   },
-  &quot;dataset_path&quot;: &quot;./dataset/ocr_rec_latexocr_dataset_example&quot;,
-  &quot;show_type&quot;: &quot;image&quot;,
-  &quot;dataset_type&quot;: &quot;FormulaRecDataset&quot;
+  "dataset_path": "ocr_rec_latexocr_dataset_example",
+  "show_type": "image",
+  "dataset_type": "FormulaRecDataset"
 }
 </code></pre>
+
 <p>上述校验结果中,check_pass 为 True 表示数据集格式符合要求,其他部分指标的说明如下:</p>
 <ul>
 <li><code>attributes.train_samples</code>:该数据集训练集样本数量为 9452;</li>

+ 43 - 20
docs/module_usage/tutorials/ocr_modules/text_recognition.en.md

@@ -290,8 +290,13 @@ For more information on using PaddleX's single-model inference APIs, please refe
 
 After running, the result obtained is:
 ```bash
-{'input_path': 'general_ocr_rec_001.png', 'rec_text': 'Oasis Shigewei Garden Apartment', 'rec_score': 0.9875298738479614}
+{'res': {'input_path': 'general_ocr_rec_001.png', 'page_index': None, 'rec_text': '绿洲仕格维花园公寓', 'rec_score': 0.9875497817993164}}
 ````
+The meanings of the running results parameters are as follows:
+- `input_path`:Represents the path to the image of the text line to be predicted.
+- `page_index`:If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `None`
+- `rec_text`:Represents the predicted text of the text line image.
+- `rec_score`:Represents the confidence score of the predicted text line image.
 
 The visualized image is as follows:
 
@@ -314,14 +319,13 @@ In the above Python script, the following steps are executed:
 <tr>
 <td><code>x</code></td>
 <td>Data to be predicted, supporting multiple input types</td>
-<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
+<td><code>Python Var</code>/<code>str</code>/<code>list</code></td>
 <td>
 <ul>
   <li><b>Python variable</b>, such as image data represented by <code>numpy.ndarray</code></li>
   <li><b>File path</b>, such as the local path of an image file: <code>/root/data/img.jpg</code></li>
   <li><b>URL link</b>, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png">Example</a></li>
   <li><b>Local directory</b>, this directory should contain the data files to be predicted, such as the local path: <code>/root/data/</code></li>
-  <li><b>Dictionary</b>, the <code>key</code> of the dictionary should correspond to the specific task, such as <code>"img"</code> for image classification tasks. The <code>value</code> of the dictionary supports the above types of data, for example: <code>{"img": "/root/data1"}</code></li>
   <li><b>List</b>, the elements of the list should be the above types of data, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code>, <code>[{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}]</code></li>
 </ul>
 </td>
@@ -455,29 +459,48 @@ After executing the above command, PaddleX will validate the dataset and summari
 <details><summary>👉 <b>Validation Result Details (Click to Expand)</b></summary>
 
 <p>The specific content of the validation result file is:</p>
-<pre><code class="language-bash">{
-  &quot;done_flag&quot;: true,
-  &quot;check_pass&quot;: true,
-  &quot;attributes&quot;: {
-    &quot;train_samples&quot;: 4468,
-    &quot;train_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_dataset_examples/images/train_word_1.png&quot;,
-      &quot;../dataset/ocr_rec_dataset_examples/images/train_word_10.png&quot;
+
+<pre><code class="language-bash">
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "train_samples": 4468,
+    "train_sample_paths": [
+      "check_dataset\/demo_img\/train_word_1.png",
+      "check_dataset\/demo_img\/train_word_2.png",
+      "check_dataset\/demo_img\/train_word_3.png",
+      "check_dataset\/demo_img\/train_word_4.png",
+      "check_dataset\/demo_img\/train_word_5.png",
+      "check_dataset\/demo_img\/train_word_6.png",
+      "check_dataset\/demo_img\/train_word_7.png",
+      "check_dataset\/demo_img\/train_word_8.png",
+      "check_dataset\/demo_img\/train_word_9.png",
+      "check_dataset\/demo_img\/train_word_10.png"
     ],
-    &quot;val_samples&quot;: 2077,
-    &quot;val_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_dataset_examples/images/val_word_1.png&quot;,
-      &quot;../dataset/ocr_rec_dataset_examples/images/val_word_10.png&quot;
+    "val_samples": 2077,
+    "val_sample_paths": [
+      "check_dataset\/demo_img\/val_word_1.png",
+      "check_dataset\/demo_img\/val_word_2.png",
+      "check_dataset\/demo_img\/val_word_3.png",
+      "check_dataset\/demo_img\/val_word_4.png",
+      "check_dataset\/demo_img\/val_word_5.png",
+      "check_dataset\/demo_img\/val_word_6.png",
+      "check_dataset\/demo_img\/val_word_7.png",
+      "check_dataset\/demo_img\/val_word_8.png",
+      "check_dataset\/demo_img\/val_word_9.png",
+      "check_dataset\/demo_img\/val_word_10.png"
     ]
   },
-  &quot;analysis&quot;: {
-    &quot;histogram&quot;: &quot;check_dataset/histogram.png&quot;
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
   },
-  &quot;dataset_path&quot;: &quot;./dataset/ocr_rec_dataset_examples&quot;,
-  &quot;show_type&quot;: &quot;image&quot;,
-  &quot;dataset_type&quot;: &quot;MSTextRecDataset&quot;
+  "dataset_path": "ocr_rec_dataset_examples",
+  "show_type": "image",
+  "dataset_type": "MSTextRecDataset"
 }
 </code></pre>
+
 <p>In the above validation result, <code>check_pass</code> being <code>true</code> indicates that the dataset format meets the requirements. Explanations for other indicators are as follows:</p>
 <ul>
 <li><code>attributes.train_samples</code>: The number of training set samples in this dataset is 4468;</li>

+ 57 - 37
docs/module_usage/tutorials/ocr_modules/text_recognition.md

@@ -20,7 +20,7 @@ comments: true
 </tr>
 <tr>
 <td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_doc_pretrained.pdparams">训练模型</a></td>
 <td>81.53</td>
 <td></td>
 <td></td>
@@ -45,7 +45,7 @@ PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a><
 </tr>
 <tr>
 <td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/en_PP-OCRv4_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>70.39</td>
 <td></td>
 <td></td>
@@ -73,7 +73,7 @@ en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 <tr>
 <td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_doc_pretrained.pdparams">训练模型</a></td>
 <td>81.53</td>
 <td></td>
 <td></td>
@@ -98,7 +98,7 @@ PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a><
 </tr>
 <tr>
 <td>PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>72.96</td>
 <td></td>
 <td></td>
@@ -165,7 +165,7 @@ SVTRv2 是一种由复旦大学视觉与学习实验室(FVL)的OpenOCR团队
 </tr>
 <tr>
 <td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/en_PP-OCRv4_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td> 70.39</td>
 <td></td>
 <td></td>
@@ -174,7 +174,7 @@ en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 <tr>
 <td>en_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/en_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>70.69</td>
 <td></td>
 <td></td>
@@ -183,6 +183,8 @@ en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 </table>
 
+<p><b>注:以上精度指标的评估集是 PaddleX 自建的英文数据集。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
+
 * <b>多语言识别模型</b>
 
 <table>
@@ -196,7 +198,7 @@ en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 <tr>
 <td>korean_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-korean_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+korean_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/korean_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>60.21</td>
 <td></td>
 <td></td>
@@ -205,7 +207,7 @@ korean_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</
 </tr>
 <tr>
 <td>japan_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-japan_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+japan_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/japan_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>45.69</td>
 <td></td>
 <td></td>
@@ -214,7 +216,7 @@ japan_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a
 </tr>
 <tr>
 <td>chinese_cht_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-chinese_cht_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+chinese_cht_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/chinese_cht_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>82.06</td>
 <td></td>
 <td></td>
@@ -223,7 +225,7 @@ chinese_cht_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模
 </tr>
 <tr>
 <td>te_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-te_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+te_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/te_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>95.88</td>
 <td></td>
 <td></td>
@@ -232,7 +234,7 @@ te_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 <tr>
 <td>ka_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-ka_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+ka_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ka_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>96.96</td>
 <td></td>
 <td></td>
@@ -241,7 +243,7 @@ ka_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 <tr>
 <td>ta_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-ta_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+ta_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/ta_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>76.83</td>
 <td></td>
 <td></td>
@@ -250,7 +252,7 @@ ta_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 </tr>
 <tr>
 <td>latin_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-latin_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+latin_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/latin_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>76.93</td>
 <td></td>
 <td></td>
@@ -259,7 +261,7 @@ latin_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a
 </tr>
 <tr>
 <td>arabic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-arabic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+arabic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/arabic_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>73.55</td>
 <td></td>
 <td></td>
@@ -268,7 +270,7 @@ arabic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</
 </tr>
 <tr>
 <td>cyrillic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-cyrillic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+cyrillic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/cyrillic_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>94.28</td>
 <td></td>
 <td></td>
@@ -277,7 +279,7 @@ cyrillic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型
 </tr>
 <tr>
 <td>devanagari_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
-devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/devanagari_PP-OCRv3_mobile_rec_pretrained.pdparams">训练模型</a></td>
 <td>96.44</td>
 <td></td>
 <td></td>
@@ -285,7 +287,7 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模
 <td>基于PP-OCRv3识别模型训练得到的超轻量梵文字母识别模型,支持梵文字母、数字识别</td>
 </tr>
 </table>
-<p><b>注:以上精度指标的评估集是 PaddleX 自建的多语种数据集。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
+<p><b>注:以上精度指标的评估集是 PaddleX 自建的多语种数据集。 </b></p>
 </details>
 
 ## 三、快速集成
@@ -303,11 +305,12 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': "{'input_path': 'general_ocr_rec_001.png', 'rec_text': '绿洲仕格维花园公寓', 'rec_score': 0.9875510334968567}"}
+{'res': {'input_path': 'general_ocr_rec_001.png', 'page_index': None, 'rec_text': '绿洲仕格维花园公寓', 'rec_score': 0.9875497817993164}}
 ```
 
 运行结果参数含义如下:
 - `input_path`:表示输入待预测文本行图像的路径
+- `page_index`:如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `rec_text`:表示文本行图像的预测文本
 - `rec_score`:表示文本行图像的预测置信度
 
@@ -362,14 +365,13 @@ for res in output:
 <tr>
 <td><code>input</code></td>
 <td>待预测数据,支持多种输入类型</td>
-<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
+<td><code>Python Var</code>/<code>str</code>/<code>list</code></td>
 <td>
 <ul>
   <li><b>Python变量</b>,如<code>numpy.ndarray</code>表示的图像数据</li>
   <li><b>文件路径</b>,如图像文件的本地路径:<code>/root/data/img.jpg</code></li>
   <li><b>URL链接</b>,如图像文件的网络URL:<a href = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_rec_001.png">示例</a></li>
   <li><b>本地目录</b>,该目录下需包含待预测数据文件,如本地路径:<code>/root/data/</code></li>
-  <li><b>字典</b>,字典的<code>key</code>需与具体任务对应,如图像分类任务对应<code>\"img\"</code>,字典的<code>val</code>支持上述类型数据,例如:<code>{\"img\": \"/root/data1\"}</code></li>
   <li><b>列表</b>,列表元素需为上述类型数据,如<code>[numpy.ndarray, numpy.ndarray]</code>,<code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>,<code>[\"/root/data1\", \"/root/data2\"]</code>,<code>[{\"img\": \"/root/data1\"}, {\"img\": \"/root/data2/img.jpg\"}]</code></li>
 </ul>
 </td>
@@ -496,27 +498,45 @@ python main.py -c paddlex/configs/modules/text_recognition/PP-OCRv4_mobile_rec.y
 <details><summary>👉 <b>校验结果详情(点击展开)</b></summary>
 
 <p>校验结果文件具体内容为:</p>
-<pre><code class="language-bash">{
-  &quot;done_flag&quot;: true,
-  &quot;check_pass&quot;: true,
-  &quot;attributes&quot;: {
-    &quot;train_samples&quot;: 4468,
-    &quot;train_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_dataset_examples/images/train_word_1.png&quot;,
-      &quot;../dataset/ocr_rec_dataset_examples/images/train_word_10.png&quot;
+
+<pre><code class="language-bash">
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "train_samples": 4468,
+    "train_sample_paths": [
+      "check_dataset\/demo_img\/train_word_1.png",
+      "check_dataset\/demo_img\/train_word_2.png",
+      "check_dataset\/demo_img\/train_word_3.png",
+      "check_dataset\/demo_img\/train_word_4.png",
+      "check_dataset\/demo_img\/train_word_5.png",
+      "check_dataset\/demo_img\/train_word_6.png",
+      "check_dataset\/demo_img\/train_word_7.png",
+      "check_dataset\/demo_img\/train_word_8.png",
+      "check_dataset\/demo_img\/train_word_9.png",
+      "check_dataset\/demo_img\/train_word_10.png"
     ],
-    &quot;val_samples&quot;: 2077,
-    &quot;val_sample_paths&quot;: [
-      &quot;../dataset/ocr_rec_dataset_examples/images/val_word_1.png&quot;,
-      &quot;../dataset/ocr_rec_dataset_examples/images/val_word_10.png&quot;
+    "val_samples": 2077,
+    "val_sample_paths": [
+      "check_dataset\/demo_img\/val_word_1.png",
+      "check_dataset\/demo_img\/val_word_2.png",
+      "check_dataset\/demo_img\/val_word_3.png",
+      "check_dataset\/demo_img\/val_word_4.png",
+      "check_dataset\/demo_img\/val_word_5.png",
+      "check_dataset\/demo_img\/val_word_6.png",
+      "check_dataset\/demo_img\/val_word_7.png",
+      "check_dataset\/demo_img\/val_word_8.png",
+      "check_dataset\/demo_img\/val_word_9.png",
+      "check_dataset\/demo_img\/val_word_10.png"
     ]
   },
-  &quot;analysis&quot;: {
-    &quot;histogram&quot;: &quot;check_dataset/histogram.png&quot;
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
   },
-  &quot;dataset_path&quot;: &quot;./dataset/ocr_rec_dataset_examples&quot;,
-  &quot;show_type&quot;: &quot;image&quot;,
-  &quot;dataset_type&quot;: &quot;MSTextRecDataset&quot;
+  "dataset_path": "ocr_rec_dataset_examples",
+  "show_type": "image",
+  "dataset_type": "MSTextRecDataset"
 }
 </code></pre>
 <p>上述校验结果中,<code>check_pass</code> 为 <code>true</code> 表示数据集格式符合要求,其他部分指标的说明如下:</p>

+ 39 - 30
docs/module_usage/tutorials/video_modules/video_classification.en.md

@@ -18,7 +18,7 @@ The Video Classification Module is a crucial component in a computer vision syst
 <th>Description</th>
 </tr>
 <tr>
-<td>PPTSM_ResNet50_k400_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PPTSM_ResNet50_k400_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSM_ResNet50_k400_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
+<td>PP-TSM-R50_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TSM-R50_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSM-R50_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
 <td>74.36</td>
 <td>93.4 M</td>
 <td rowspan="1">
@@ -33,16 +33,17 @@ PP-TSM is a video classification model developed by Baidu PaddlePaddle's Vision
 <td rowspan="2">PP-TSMv2 is a lightweight video classification model optimized based on the CPU-oriented model PP-LCNetV2. It undergoes model tuning in seven aspects: backbone network and pre-trained model selection, data augmentation, TSM module tuning, input frame number optimization, decoding speed optimization, DML distillation, and LTA module. Under the center crop evaluation method, it achieves an accuracy of 75.16%, with an inference speed of only 456ms on the CPU for a 10-second video input.</td>
 </tr>
 <tr>
-<td>PPTSMv2_LCNet_k400_16frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PPTSMv2_LCNet_k400_16frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSMv2_LCNet_k400_16frames_uniform_pretrained.pdparams">Trained Model</a></td>
+<td>PP-TSMv2-LCNetV2_16frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TSMv2-LCNetV2_16frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSMv2-LCNetV2_16frames_uniform_pretrained.pdparams">Trained Model</a></td>
 <td>73.11</td>
 <td>22.5 M</td>
 </tr>
 
 </table>
 
-<p><b>Note: The above accuracy metrics refer to Top-1 Accuracy on the <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.md">K400</a> validation set. </b><b>All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speeds are based on Intel® Xeon® Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type FP32.</b></p></details>
+<p><b>Note: The above accuracy metrics refer to Top-1 Accuracy on the <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.md">K400</a> validation set. </b></p></details>
+
+## III. Quick Integration
 
-## <span id="lable">III. Quick Integration</span>
 > ❗ Before quick integration, please install the PaddleX wheel package. For detailed instructions, refer to the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
 
 After installing the wheel package, you can complete video classification module inference with just a few lines of code. You can switch between models in this module freely, and you can also integrate the model inference of the video classification module into your project. Before running the following code, please download the [demo video](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/general_video_classification_001.mp4) to your local machine.
@@ -73,7 +74,6 @@ The visualization video is as follows:
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/video_classification/general_video_classification_001.jpg" alt="Visualization Image">
 
-Note: Due to network issues, the above URL may not be accessible. If you need to access the content of this link, please check the validity of the link and try again. If you encounter any problems, it might be related to the link itself or the network connection.
 
 The Python script above performs the following steps:
 * `create_model` instantiates a video classification model (here using `PP-TSMv2-LCNetV2_8frames_uniform` as an example), with specific explanations as follows:
@@ -102,6 +102,13 @@ The Python script above performs the following steps:
 <td>None</td>
 <td>None</td>
 </tr>
+<tr>
+<td><code> topk</code></td>
+<td>The top `topk` categories and corresponding classification probabilities of the prediction result;if not specified, the default configuration of the PaddleX official model will be used</td>
+<td><code>int</code></td>
+<td>None</td>
+<td><code>1</code></td>
+</tr>
 </table>
 
 * The `predict` method of the video classification model is called for inference and prediction. The parameter of the `predict` method is `input`, which is used to input the data to be predicted and supports multiple input types, with specific explanations as follows:
@@ -256,42 +263,44 @@ After executing the above command, PaddleX will validate the dataset and summari
 
 <details><summary>👉 <b>Validation Results Details (Click to Expand)</b></summary>
 
-<pre><code class="language-bash">{ "done_flag": true,
+<pre><code class="language-bash">
+{
+  "done_flag": true,
   "check_pass": true,
   "attributes": {
-    "label_file": "../../dataset/k400_examples/label.txt",
+    "label_file": "..\/..\/dataset\/k400_examples\/label.txt",
     "num_classes": 5,
     "train_samples": 250,
     "train_sample_paths": [
-      "check_dataset/../../dataset/k400_examples/videos/Wary2ON3aSo_000079_000089.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/_LHpfh0rXjk_000012_000022.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/dyoiNbn80q0_000039_000049.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/brBw6cFwock_000049_000059.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/-o4X5Z_Isyc_000085_000095.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/e24p-4W3TiU_000011_000021.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/2Grg_zwmYZE_000004_000014.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/aZY_0UqRNgA_000098_000108.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/WZlsi4nQHOo_000025_000035.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/rRh-lkFj4Tw_000001_000011.mp4"
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/Wary2ON3aSo_000079_000089.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/_LHpfh0rXjk_000012_000022.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/dyoiNbn80q0_000039_000049.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/brBw6cFwock_000049_000059.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/-o4X5Z_Isyc_000085_000095.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/e24p-4W3TiU_000011_000021.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/2Grg_zwmYZE_000004_000014.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/aZY_0UqRNgA_000098_000108.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/WZlsi4nQHOo_000025_000035.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/rRh-lkFj4Tw_000001_000011.mp4"
     ],
     "val_samples": 50,
     "val_sample_paths": [
-      "check_dataset/../../dataset/k400_examples/videos/7Mga5kywfU4.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/w5UCdQ2NmfY.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/Qbo_tnzfjOY.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/LgW8pMDtylE.mkv",
-      "check_dataset/../../dataset/k400_examples/videos/BY0883Dvt1c.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/PHQkMPu-KNo.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/7LSJ2Ryv1a8.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/oBYZWvlI8Uk.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/dpn2eg9O3Rs.mkv",
-      "check_dataset/../../dataset/k400_examples/videos/hXtsZAaZ3yc.mkv"
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/7Mga5kywfU4.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/w5UCdQ2NmfY.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/Qbo_tnzfjOY.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/LgW8pMDtylE.mkv",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/BY0883Dvt1c.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/PHQkMPu-KNo.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/7LSJ2Ryv1a8.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/oBYZWvlI8Uk.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/dpn2eg9O3Rs.mkv",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/hXtsZAaZ3yc.mkv"
     ]
   },
   "analysis": {
-    "histogram": "check_dataset/histogram.png"
+    "histogram": "check_dataset\/histogram.png"
   },
-  "dataset_path": "./dataset/k400_examples",
+  "dataset_path": "k400_examples",
   "show_type": "video",
   "dataset_type": "VideoClsDataset"
 }
@@ -389,7 +398,7 @@ Similar to model training, the following steps are required:
 
 * Specify the path of the model's `.yaml` configuration file (here it is `PP-TSMv2-LCNetV2_8frames_uniform.yaml`)
 * Specify the mode as model evaluation: `-o Global.mode=evaluate`
-* Specify the path of the validation dataset: `-o Global.dataset_dir`. Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration. Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.en.md).
+* Specify the path of the validation dataset: `-o Global.dataset_dir`.  Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.en.md).
 
 <details><summary>👉 <b>More Details (Click to Expand)</b></summary>
 

+ 26 - 25
docs/module_usage/tutorials/video_modules/video_classification.md

@@ -265,42 +265,43 @@ python main.py -c paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_
 <details><summary>👉 <b>校验结果详情(点击展开)</b></summary>
 <p>校验结果文件具体内容为:</p>
 <pre><code class="language-bash">
-{ "done_flag": true,
+{
+  "done_flag": true,
   "check_pass": true,
   "attributes": {
-    "label_file": "../../dataset/k400_examples/label.txt",
+    "label_file": "..\/..\/dataset\/k400_examples\/label.txt",
     "num_classes": 5,
     "train_samples": 250,
     "train_sample_paths": [
-      "check_dataset/../../dataset/k400_examples/videos/Wary2ON3aSo_000079_000089.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/_LHpfh0rXjk_000012_000022.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/dyoiNbn80q0_000039_000049.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/brBw6cFwock_000049_000059.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/-o4X5Z_Isyc_000085_000095.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/e24p-4W3TiU_000011_000021.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/2Grg_zwmYZE_000004_000014.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/aZY_0UqRNgA_000098_000108.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/WZlsi4nQHOo_000025_000035.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/rRh-lkFj4Tw_000001_000011.mp4"
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/Wary2ON3aSo_000079_000089.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/_LHpfh0rXjk_000012_000022.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/dyoiNbn80q0_000039_000049.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/brBw6cFwock_000049_000059.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/-o4X5Z_Isyc_000085_000095.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/e24p-4W3TiU_000011_000021.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/2Grg_zwmYZE_000004_000014.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/aZY_0UqRNgA_000098_000108.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/WZlsi4nQHOo_000025_000035.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/rRh-lkFj4Tw_000001_000011.mp4"
     ],
     "val_samples": 50,
     "val_sample_paths": [
-      "check_dataset/../../dataset/k400_examples/videos/7Mga5kywfU4.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/w5UCdQ2NmfY.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/Qbo_tnzfjOY.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/LgW8pMDtylE.mkv",
-      "check_dataset/../../dataset/k400_examples/videos/BY0883Dvt1c.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/PHQkMPu-KNo.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/7LSJ2Ryv1a8.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/oBYZWvlI8Uk.mp4",
-      "check_dataset/../../dataset/k400_examples/videos/dpn2eg9O3Rs.mkv",
-      "check_dataset/../../dataset/k400_examples/videos/hXtsZAaZ3yc.mkv"
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/7Mga5kywfU4.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/w5UCdQ2NmfY.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/Qbo_tnzfjOY.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/LgW8pMDtylE.mkv",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/BY0883Dvt1c.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/PHQkMPu-KNo.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/7LSJ2Ryv1a8.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/oBYZWvlI8Uk.mp4",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/dpn2eg9O3Rs.mkv",
+      "check_dataset\/..\/..\/dataset\/k400_examples\/videos\/hXtsZAaZ3yc.mkv"
     ]
   },
   "analysis": {
-    "histogram": "check_dataset/histogram.png"
+    "histogram": "check_dataset\/histogram.png"
   },
-  "dataset_path": "./dataset/k400_examples",
+  "dataset_path": "k400_examples",
   "show_type": "video",
   "dataset_type": "VideoClsDataset"
 }
@@ -405,7 +406,7 @@ python main.py -c  paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2
 <details><summary>👉 <b>更多说明(点击展开)</b></summary>
 
 <p>在模型评估时,需要指定模型权重文件路径,每个配置文件中都内置了默认的权重保存路径,如需要改变,只需要通过追加命令行参数的形式进行设置即可,如<code>-o Evaluate.weight_path=./output/best_model/best_model.pdparams</code>。</p>
-<p>在完成模型评估后,会产出<code>evaluate_result.json,其记录了</code>评估的结果,具体来说,记录了评估任务是否正常完成,以及模型的评估指标,包含 val.top1、val.top5;</p></details>
+<p>在完成模型评估后,会产出<code>evaluate_result.json</code>,其记录了评估的结果,具体来说,记录了评估任务是否正常完成,以及模型的评估指标,包含 val.top1、val.top5;</p></details>
 
 ### <b>4.4 模型推理和模型集成</b>
 

+ 7 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.en.md

@@ -490,6 +490,13 @@ In the above Python script, the following steps are executed:
 <td><code>None</code></td>
 </tr>
 <tr>
+<td><code>config</code></td>
+<td>Specific configuration information for the pipeline (if set simultaneously with the <code>pipeline</code>, it takes precedence over the <code>pipeline</code>, and the pipeline name must match the <code>pipeline</code>).
+</td>
+<td><code>dict[str, Any]</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
 <td><code>device</code></td>
 <td>The device used for production line inference. It supports specifying specific GPU card numbers, such as "gpu:0", other hardware card numbers, such as "npu:0", or CPU, such as "cpu".</td>
 <td><code>str</code></td>

+ 6 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.md

@@ -506,6 +506,12 @@ for res in output:
 <td><code>None</code></td>
 </tr>
 <tr>
+<td><code>config</code></td>
+<td>产线具体的配置信息(如果和<code>pipeline</code>同时设置,优先级高于<code>pipeline</code>,且要求产线名和<code>pipeline</code>一致)。</td>
+<td><code>dict[str, Any]</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
 <td><code>device</code></td>
 <td>产线推理设备。支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。</td>
 <td><code>str</code></td>

+ 249 - 25
docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.en.md

@@ -31,17 +31,19 @@ The formula recognition pipeline is designed to solve formula recognition tasks
 </thead>
 <tbody>
 <tr>
-<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Trained Model</a></td>
-<td>92.6</td>
-<td>115.126</td>
-<td>3827.25</td>
-<td>470.2M</td>
+<td>PP-LCNet_x1_0_doc_ori</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-LCNet_x1_0_doc_ori_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-LCNet_x1_0_doc_ori_pretrained.pdparams">Training Model</a></td>
+<td>99.06</td>
+<td>3.84845</td>
+<td>9.23735</td>
+<td>7</td>
+<td>A document image classification model based on PP-LCNet_x1_0, with four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees.</td>
 </tr>
 </tbody>
 </table>
-<b>Note: The accuracy metrics are evaluated on a self-built dataset covering ID cards and documents, with 1000 images. GPU inference time is based on an NVIDIA Tesla T4 machine, with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads and FP32 precision.</b>
+<b>Note: The evaluation dataset for the above accuracy metrics is a self-built dataset covering multiple scenarios such as certificates and documents, with 1,000 images. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
 
-<p><b>Text Image Unwarping Module (Optional):</b></p>
+
+<p><b>Text Image Correction Module (Optional):</b></p>
 
 <table>
 <thead>
@@ -54,18 +56,237 @@ The formula recognition pipeline is designed to solve formula recognition tasks
 </thead>
 <tbody>
 <tr>
-<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Trained Model</a></td>
-<td>0.8821</td>
-<td>0.0823</td>
-<td>40.01</td>
-<td>-</td>
+<td>UVDoc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UVDoc_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UVDoc_pretrained.pdparams">Training Model</a></td>
+<td>0.179</td>
+<td>30.3 M</td>
+<td>High-precision text image correction model</td>
+</tr>
+</tbody>
+</table>
+<b>Note: The accuracy metrics of the model are measured from the <a href="https://www3.cs.stonybrook.edu/~cvl/docunet.html">DocUNet benchmark</a>.</b>
+
+<p><b>Layout Detection Module (Optional):</b></p>
+
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>GPU Inference Time (ms)</th>
+<th>CPU Inference Time (ms)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PP-DocLayout-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-L_pretrained.pdparams">Training Model</a></td>
+<td>90.4</td>
+<td>34.5252</td>
+<td>1454.27</td>
+<td>123.76 M</td>
+<td>A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L.</td>
+</tr>
+<tr>
+<td>PP-DocLayout-M</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-M_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-M_pretrained.pdparams">Training Model</a></td>
+<td>75.2</td>
+<td>15.9</td>
+<td>160.1</td>
+<td>22.578</td>
+<td>A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L.</td>
+</tr>
+<tr>
+<td>PP-DocLayout-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-DocLayout-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-DocLayout-S_pretrained.pdparams">Training Model</a></td>
+<td>70.9</td>
+<td>13.8</td>
+<td>46.7</td>
+<td>4.834</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S.</td>
+</tr>
+</tbody>
+</table>
+
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 500 common document-type images of Chinese and English papers, magazines, contracts, books, exams, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+> ❗ The above list includes the <b>3 core models</b> that are key supported by the text recognition module. The module actually supports a total of <b>11 full models</b>, including several predefined models with different categories. The complete model list is as follows:
+
+<details><summary> 👉 Details of Model List</summary>
+
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>GPU Inference Time (ms)</th>
+<th>CPU Inference Time (ms)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet_layout_1x_table</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_table_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_table_pretrained.pdparams">Training Model</a></td>
+<td>97.5</td>
+<td>12.623</td>
+<td>90.8934</td>
+<td>7.4 M</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset using PicoDet-1x, capable of detecting table regions.</td>
+</tr>
+</table>
+
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout table area detection dataset by PaddleOCR, containing 7835 Chinese and English document images with tables. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+* <b>3-Class Layout Detection Model, including Table, Image, and Stamp</b>
+
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>GPU Inference Time (ms)</th>
+<th>CPU Inference Time (ms)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet-S_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_3cls_pretrained.pdparams">Training Model</a></td>
+<td>88.2</td>
+<td>13.5</td>
+<td>45.8</td>
+<td>4.8</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
+</tr>
+<tr>
+<td>PicoDet-L_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_3cls_pretrained.pdparams">Training Model</a></td>
+<td>89.0</td>
+<td>15.7</td>
+<td>159.8</td>
+<td>22.6</td>
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
+</tr>
+<tr>
+<td>RT-DETR-H_layout_3cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_3cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_3cls_pretrained.pdparams">Training Model</a></td>
+<td>95.8</td>
+<td>114.6</td>
+<td>3832.6</td>
+<td>470.1</td>
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
+</tr>
+</table>
+
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 1154 common document images of Chinese and English papers, magazines, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+* <b>5-Class English Document Area Detection Model, including Text, Title, Table, Image, and List</b>
+
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>GPU Inference Time (ms)</th>
+<th>CPU Inference Time (ms)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet_layout_1x</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet_layout_1x_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet_layout_1x_pretrained.pdparams">Training Model</a></td>
+<td>97.8</td>
+<td>13.0</td>
+<td>91.3</td>
+<td>7.4</td>
+<td>A high-efficiency English document layout area localization model trained on the PubLayNet dataset using PicoDet-1x.</td>
+</tr>
+</table>
+
+<b>Note: The evaluation dataset for the above precision metrics is the [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) dataset, containing 11245 English document images. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+
+* <b>17-Class Area Detection Model, including 17 common layout categories: Paragraph Title, Image, Text, Number, Abstract, Content, Figure Caption, Formula, Table, Table Caption, References, Document Title, Footnote, Header, Algorithm, Footer, and Stamp</b>
+
+<table>
+<thead>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>mAP(0.5) (%)</th>
+<th>GPU Inference Time (ms)</th>
+<th>CPU Inference Time (ms)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>PicoDet-S_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-S_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-S_layout_17cls_pretrained.pdparams">Training Model</a></td>
+<td>87.4</td>
+<td>13.6</td>
+<td>46.2</td>
+<td>4.8</td>
+<td>A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S.</td>
+</tr>
+
+<tr>
+<td>PicoDet-L_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PicoDet-L_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PicoDet-L_layout_17cls_pretrained.pdparams">Training Model</a></td>
+<td>89.0</td>
+<td>17.2</td>
+<td>160.2</td>
+<td>22.6</td>
+<td>A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L.</td>
+</tr>
+
+<tr>
+<td>RT-DETR-H_layout_17cls</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/RT-DETR-H_layout_17cls_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/RT-DETR-H_layout_17cls_pretrained.pdparams">Training Model</a></td>
+<td>98.3</td>
+<td>115.1</td>
+<td>3827.2</td>
+<td>470.2</td>
+<td>A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H.</td>
+</tr>
+</tbody>
+</table>
+
+<b>Note: The evaluation dataset for the above precision metrics is a self-built layout area detection dataset by PaddleOCR, containing 892 common document images of Chinese and English papers, magazines, and research reports. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b>
+</details>
+
+<p><b>Formula Recognition Module </b></p>
+<table>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Avg-BLEU(%)</th>
+<th>GPU Inference Time (ms)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+<td>UniMERNet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UniMERNet_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">Training Model</a></td>
+<td>86.13</td>
+<td>2266.96</td>
+<td>1.4 G</td>
+<td>UniMERNet is a formula recognition model developed by Shanghai AI Lab. It uses Donut Swin as the encoder and MBartDecoder as the decoder. The model is trained on a dataset of one million samples, including simple formulas, complex formulas, scanned formulas, and handwritten formulas, significantly improving the recognition accuracy of real-world formulas.</td>
+<tr>
+<td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">Training Model</a></td>
+<td>87.12</td>
+<td>202.25</td>
+<td>167.9 M</td>
+<td rowspan="2">PP-FormulaNet is an advanced formula recognition model developed by the Baidu PaddlePaddle Vision Team. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network. Through parallel masking and model distillation techniques, it significantly improves inference speed while maintaining high recognition accuracy, making it suitable for applications requiring fast inference. The PP-FormulaNet-L version, on the other hand, uses Vary_VIT_B as its backbone network and is trained on a large-scale formula dataset, showing significant improvements in recognizing complex formulas compared to PP-FormulaNet-S.</td>
+</tr>
+<td>PP-FormulaNet-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-L_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">Training Model</a></td>
+<td>92.13</td>
+<td>1976.52</td>
+<td>535.2 M</td>
+<tr>
+<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Training Model</a></td>
+<td>71.63</td>
 <td>-</td>
 <td>89.7 M</td>
-<td>LaTeX-OCR is a formula recognition algorithm based on a large autoregressive model. By using Hybrid ViT as the backbone network and transformer as the decoder, it significantly improves the accuracy of formula recognition.</td>
+<td>LaTeX-OCR is a formula recognition algorithm based on an autoregressive large model. It uses Hybrid ViT as the backbone network and a transformer as the decoder, significantly improving the accuracy of formula recognition.</td>
 </tr>
 </table>
 
-<b>Note: The above accuracy metrics are measured from the internally built formula recognition test set of PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on Tesla V100 GPUs, with precision type FP32.</b>
+
+
+<b>Note: The above accuracy metrics are measured using an internally built formula recognition test set within PaddleX. The BLEU score of LaTeX_OCR_rec on the LaTeX-OCR formula recognition test set is 0.8821. All model GPU inference times are based on machines with Tesla V100 GPUs, with precision type FP32.</b>
 
 ## 2. Quick Start
 PaddleX supports experiencing the formula recognition pipeline locally using the command line or Python.
@@ -89,15 +310,15 @@ paddlex --pipeline formula_recognition \
         --device gpu:0
 ```
 
-The relevant parameter descriptions can be referenced from [2.2.2 Python Script Integration](#222-python-script-integration).
+The relevant parameter descriptions can be referenced from [2.2 Integration via Python Script](#22-integration-via-python-script).
 
 After running, the results will be printed to the terminal, as shown below:
 
 ```bash
-{'res': {'input_path': 'general_formula_recognition.png', 'model_settings': {'use_doc_preprocessor': False,'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9778407216072083, 'coordinate': [271.257, 648.50824, 1040.2291, 774.8482]}, ...]}, 'formula_res_list': [{'rec_formula': '\\small\\begin{aligned}{p(\\mathbf{x})=c(\\mathbf{u})\\prod_{i}p(x_{i}).}\\\\ \\end{aligned}', 'formula_region_id': 1, 'dt_polys': ([553.0718, 802.0996, 758.75635, 853.093],)}, ...]}}
+{'res': {'input_path': 'general_formula_recognition.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False,'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9778407216072083, 'coordinate': [271.257, 648.50824, 1040.2291, 774.8482]}, ...]}, 'formula_res_list': [{'rec_formula': '\\small\\begin{aligned}{p(\\mathbf{x})=c(\\mathbf{u})\\prod_{i}p(x_{i}).}\\\\ \\end{aligned}', 'formula_region_id': 1, 'dt_polys': ([553.0718, 802.0996, 758.75635, 853.093],)}, ...]}}
 ```
 
-The explanation of the running result parameters can refer to the result interpretation in [2.2.2 Integration via Python Script](#222-python脚本方式集成).
+The explanation of the running result parameters can refer to the result interpretation in [2.2 Integration via Python Script](#22-integration-via-python-script).
 
 The visualization results are saved under `save_path`, where the visualization result of formula recognition is as follows:
 
@@ -197,7 +418,7 @@ In the above Python script, the following steps are executed:
 <td>
 <ul>
   <li><b>Python Var</b>: Image data represented by <code>numpy.ndarray</code></li>
-  <li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
+  <li><b>str</b>: Local path of image or PDF file, e.g., <code>/root/data/img.jpg</code>; <b>URL link</b>, e.g., network URL of image or PDF file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png">Example</a>; <b>Local directory</b>, the directory should contain images to be predicted, e.g., local path: <code>/root/data/</code> (currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)</li>
   <li><b>List</b>: Elements of the list must be of the above types, e.g., <code>[numpy.ndarray, numpy.ndarray]</code>, <code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>, <code>[\"/root/data1\", \"/root/data2\"]</code></li>
 </ul>
 </td>
@@ -377,6 +598,8 @@ In the above Python script, the following steps are executed:
 
     - `input_path`: `(str)` The input path of the image to be predicted.
 
+    - `page_index`: `(Union[int, None])` If the input is a PDF file, this indicates the current page number of the PDF. Otherwise, it is `None`
+
     - `model_settings`: `(Dict[str, bool])` The model parameters required for the production line configuration.
 
         - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-production line.
@@ -400,7 +623,7 @@ In the above Python script, the following steps are executed:
         - `formula_region_id`: `(int)` The ID number predicted by formula recognition.
         - `dt_polys`: `(List[float])` The bounding box coordinates predicted by formula recognition, in the format [x_min, y_min, x_max, y_max], where (x_min, y_min) is the top-left corner and (x_max, y_max) is the bottom-right corner.
 
-- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format.
+- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format.
 - Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_formula_res_img.{your_img_extension}`. If a file is specified, it will be saved directly to that file. (The production line usually contains many result images, so it is not recommended to specify a specific file path directly, otherwise multiple images will be overwritten and only the last one will be retained.)
 
 * In addition, you can also obtain the visualization image with results and the prediction results through attributes, as follows:
@@ -459,7 +682,7 @@ for res in output:
 ## 3. Development Integration/Deployment
 If the formula recognition production line meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
 
-If you need to integrate the formula recognition production line into your Python project, you can refer to the example code in [2.2 Python Script Method](#22-python脚本方式集成).
+If you need to integrate the formula recognition production line into your Python project, you can refer to the example code in [ 2.2 Integration via Python Script](#22-integration-via-python-script).
 
 In addition, PaddleX also provides three other deployment methods, which are detailed as follows:
 
@@ -699,7 +922,7 @@ Since the formula recognition pipeline consists of several modules, if the pipel
   <tbody>
     <tr>
       <td>Formulas are missing</td>
-      <td>Layout Area Detection Module</td>
+      <td>Layout Detection Module</td>
       <td><a href="../../../module_usage/tutorials/ocr_modules/layout_detection.en.md">Link</a></td>
     </tr>
     <tr>
@@ -732,13 +955,13 @@ SubModules:
   LayoutDetection:
     module_name: layout_detection
     model_name: PP-DocLayout-L
-    model_dir: null # 替换为微调后的版面区域检测模型权重路径
+    model_dir: null # Replace with the fine-tuned layout detection model weights path
 ...
 
   FormulaRecognition:
     module_name: formula_recognition
     model_name: PP-FormulaNet-L
-    model_dir: null # 替换为微调后的公式识别模型权重路径
+    model_dir: null # Replace with the fine-tuned formula recognition model weights path
     batch_size: 5
 
 SubPipelines:
@@ -750,7 +973,7 @@ SubPipelines:
       DocOrientationClassify:
         module_name: doc_text_orientation
         model_name: PP-LCNet_x1_0_doc_ori
-        model_dir: null # 替换为微调后的文档图像方向分类模型权重路径
+        model_dir: null # Replace with the fine-tuned document image orientation classification model weights path
         batch_size: 1
 ...
 ```
@@ -760,7 +983,7 @@ Then, refer to the command-line or Python script methods in [2. Quick Start](#2-
 ##  5. Multi-Hardware Support
 PaddleX supports a variety of mainstream hardware devices, including NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. You can seamlessly switch between different hardware devices by simply modifying the `--device` parameter.
 
-For example, if you use Ascend NPU for formula recognition pipeline inference, the Python command is:
+For example, if you use Ascend NPU for formula recognition pipeline inference, the CLI command is:
 
 ```bash
 paddlex --pipeline formula_recognition \
@@ -776,5 +999,6 @@ paddlex --pipeline formula_recognition \
         --device npu:0
 
 ```
+Of course, you can also specify the hardware device when calling `create_pipeline()` or `predict()` in a Python script.
 
 If you want to use the formula recognition production line on more types of hardware, please refer to the [PaddleX Multi-Hardware Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).

+ 11 - 11
docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md

@@ -263,31 +263,31 @@ comments: true
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
-<th>Avg-BLEU</th>
+<th>Avg-BLEU(%)</th>
 <th>GPU推理耗时 (ms)</th>
 <th>模型存储大小 (M)</th>
 <th>介绍</th>
 </tr>
 <td>UniMERNet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UniMERNet_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">训练模型</a></td>
-<td>0.8613</td>
+<td>86.13</td>
 <td>2266.96</td>
 <td>1.4 G</td>
 <td>UniMERNet是由上海AI Lab研发的一款公式识别模型。该模型采用Donut Swin作为编码器,MBartDecoder作为解码器,并通过在包含简单公式、复杂公式、扫描捕捉公式和手写公式在内的一百万数据集上进行训练,大幅提升了模型对真实场景公式的识别准确率</td>
 <tr>
 <td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-S_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">训练模型</a></td>
-<td>0.8712</td>
+<td>87.12</td>
 <td>202.25</td>
 <td>167.9 M</td>
 <td rowspan="2">PP-FormulaNet 是由百度飞桨视觉团队开发的一款先进的公式识别模型,支持5万个常见LateX源码词汇的识别。PP-FormulaNet-S 版本采用了 PP-HGNetV2-B4 作为其骨干网络,通过并行掩码和模型蒸馏等技术,大幅提升了模型的推理速度,同时保持了较高的识别精度,适用于简单印刷公式、跨行简单印刷公式等场景。而 PP-FormulaNet-L 版本则基于 Vary_VIT_B 作为骨干网络,并在大规模公式数据集上进行了深入训练,在复杂公式的识别方面,相较于PP-FormulaNet-S表现出显著的提升,适用于简单印刷公式、复杂印刷公式、手写公式等场景。 </td>
 
 </tr>
 <td>PP-FormulaNet-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">训练模型</a></td>
-<td>0.9213</td>
+<td>92.13</td>
 <td>1976.52</td>
 <td>535.2 M</td>
 <tr>
 <td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">训练模型</a></td>
-<td>0.7163</td>
+<td>71.63</td>
 <td>-</td>
 <td>89.7 M</td>
 <td>LaTeX-OCR是一种基于自回归大模型的公式识别算法,通过采用 Hybrid ViT 作为骨干网络,transformer作为解码器,显著提升了公式识别的准确性。</td>
@@ -317,14 +317,14 @@ paddlex --pipeline formula_recognition \
         --save_path ./output \
         --device gpu:0
 ```
-相关的参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的参数说明。
+相关的参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的参数说明。
 
 运行后,会将结果打印到终端上,结果如下:
 
 ```bash
-{'res': {'input_path': 'general_formula_recognition.png', 'model_settings': {'use_doc_preprocessor': False,'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9778407216072083, 'coordinate': [271.257, 648.50824, 1040.2291, 774.8482]}, ...]}, 'formula_res_list': [{'rec_formula': '\\small\\begin{aligned}{p(\\mathbf{x})=c(\\mathbf{u})\\prod_{i}p(x_{i}).}\\\\ \\end{aligned}', 'formula_region_id': 1, 'dt_polys': ([553.0718, 802.0996, 758.75635, 853.093],)}, ...]}}
+{'res': {'input_path': 'general_formula_recognition.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False,'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9778407216072083, 'coordinate': [271.257, 648.50824, 1040.2291, 774.8482]}, ...]}, 'formula_res_list': [{'rec_formula': '\\small\\begin{aligned}{p(\\mathbf{x})=c(\\mathbf{u})\\prod_{i}p(x_{i}).}\\\\ \\end{aligned}', 'formula_region_id': 1, 'dt_polys': ([553.0718, 802.0996, 758.75635, 853.093],)}, ...]}}
 ```
-运行结果参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的结果解释。
+运行结果参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的结果解释。
 
 
 可视化结果保存在`save_path`下,其中公式识别的可视化结果如下:
@@ -365,7 +365,7 @@ for res in output:
 
 在上述 Python 脚本中,执行了如下几个步骤:
 
-(1)通过 `create_pipeline()` 实例化 公式识别 产线对象,具体参数说明如下:
+(1)通过 `create_pipeline()` 实例化公式识别产线对象,具体参数说明如下:
 
 <table>
 <thead>
@@ -423,7 +423,7 @@ for res in output:
 <td>
 <ul>
   <li><b>Python Var</b>:如 <code>numpy.ndarray</code> 表示的图像数据</li>
-  <li><b>str</b>:如图像文件或者PDF文件的本地路径:<code>/root/data/img.jpg</code>;<b>如URL链接</b>,如图像文件或PDF文件的网络URL:<a href = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png">示例</a>;<b>如本地目录</b>,该目录下需包含待预测图像,如本地路径:<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)</li>
+  <li><b>str</b>:如图像文件或者PDF文件的本地路径:<code>/root/data/img.jpg</code>;<b>如URL链接</b>,如图像文件或PDF文件的网络URL:<a href = "https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png">示例</a>;<b>如本地目录</b>,该目录下需包含待预测图像,如本地路径:<code>/root/data/</code>(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)</li>
   <li><b>List</b>:列表元素需为上述类型数据,如<code>[numpy.ndarray, numpy.ndarray]</code>,<code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>,<code>[\"/root/data1\", \"/root/data2\"]</code></li>
 </ul>
 </td>
@@ -634,7 +634,7 @@ for res in output:
         - `formula_region_id`: `(int)` 公式识别预测的id编号
         - `dt_polys`:  `(List[float])` 公式识别预测的边界框坐标,格式为[x_min, y_min, x_max, y_max],其中(x_min, y_min)为左上角坐标,(x_max, y_max) 为右上角坐标
 
-- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
+- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_res.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
 - 调用`save_to_img()` 方法会将可视化结果保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_img_basename}_formula_res_img.{your_img_extension}`,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果图片,不建议直接指定为具体的文件路径,否则多张图会被覆盖,仅保留最后一张图)
 
 * 此外,也支持通过属性获取带结果的可视化图像和预测结果,具体如下:

+ 26 - 14
docs/pipeline_usage/tutorials/video_pipelines/video_classification.en.md

@@ -15,33 +15,35 @@ The general video classification pipeline is used to solve video classification
 
 <table>
 <tr>
-<th>Model</th><th>Model Download Link</th><th>Top1 Acc(%)</th><th>Model Storage Size (M)</th><th>Description</th>
+<th>Model</th><th>Model Download Link</th>
+<th>Top1 Acc(%)</th>
+<th>Model Storage Size (M)</th>
+<th>Description</th>
 </tr>
 <tr>
-<td>PPTSM_ResNet50_k400_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PPTSM_ResNet50_k400_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSM_ResNet50_k400_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
+<td>PP-TSM-R50_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TSM-R50_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSM-R50_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
 <td>74.36</td>
 <td>93.4 M</td>
 <td rowspan="1">
-PP-TSM is a video classification model developed by Baidu PaddlePaddle Vision Team. The model is optimized based on the ResNet-50 backbone network, with model tuning in six aspects: data augmentation, network structure fine-tuning, training strategy, BN layer optimization, pretrained model selection, and model distillation. Under the center sampling evaluation method, the accuracy on Kinetics-400 is improved by 3.95 points compared to the original paper.
+PP-TSM is a video classification model developed by Baidu PaddlePaddle's Vision Team. This model is optimized based on the ResNet-50 backbone network and undergoes model tuning in six aspects: data augmentation, network structure fine-tuning, training strategies, Batch Normalization (BN) layer optimization, pre-trained model selection, and model distillation. Under the center crop evaluation method, its accuracy on Kinetics-400 is improved by 3.95 points compared to the original paper's implementation.
 </td>
 </tr>
 
 <tr>
-<td>PPTSMv2_LCNet_k400_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PPTSMv2_LCNet_k400_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSMv2_LCNet_k400_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
+<td>PP-TSMv2-LCNetV2_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TSMv2-LCNetV2_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSMv2-LCNetV2_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
 <td>71.71</td>
 <td>22.5 M</td>
-<td rowspan="2">PP-TSMv2 is a lightweight video classification model optimized based on the CPU-side model PP-LCNetV2. The model tuning includes seven aspects: backbone network and pretrained model selection, data augmentation, TSM module tuning, input frame number optimization, decoding speed optimization, DML distillation, and LTA module. Under the center sampling evaluation method, the accuracy reaches 75.16%, and the inference speed for a 10s video on the CPU side is only 456ms.
-</td>
+<td rowspan="2">PP-TSMv2 is a lightweight video classification model optimized based on the CPU-oriented model PP-LCNetV2. It undergoes model tuning in seven aspects: backbone network and pre-trained model selection, data augmentation, TSM module tuning, input frame number optimization, decoding speed optimization, DML distillation, and LTA module. Under the center crop evaluation method, it achieves an accuracy of 75.16%, with an inference speed of only 456ms on the CPU for a 10-second video input.</td>
 </tr>
 <tr>
-<td>PPTSMv2_LCNet_k400_16frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PPTSMv2_LCNet_k400_16frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSMv2_LCNet_k400_16frames_uniform_pretrained.pdparams">Trained Model</a></td>
+<td>PP-TSMv2-LCNetV2_16frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TSMv2-LCNetV2_16frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSMv2-LCNetV2_16frames_uniform_pretrained.pdparams">Trained Model</a></td>
 <td>73.11</td>
 <td>22.5 M</td>
 </tr>
 
 </table>
 
-<p><b>Note: The above accuracy metrics are Top1 Acc on the <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.en.md">K400</a> validation set. All model GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision, and CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.</b></p></details>
+<p><b>Note: The above accuracy metrics refer to Top-1 Accuracy on the <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/dataset/k400.md">K400</a> validation set. </b></p></details>
 
 ## 2. Quick Start
 
@@ -60,15 +62,16 @@ paddlex --pipeline video_classification \
     --device gpu:0
 ```
 
-The relevant parameter descriptions can be found in the parameter descriptions in [2.2.2 Integration via Python Script]().
+The relevant parameter descriptions can be found in the parameter descriptions in [2.2 Integration via Python Script](#22-integration-with-python-script).
 
 After running, the result will be printed to the terminal, as follows:
 
 ```bash
-{'res': {'input_path': 'general_video_classification_001.mp4', 'class_ids': array([  0, 278,  68, 272, 162], dtype=int32), 'scores': [0.91996, 0.07055, 0.00235, 0.00215, 0.00158], 'label_names': ['abseiling', 'rock_climbing', 'climbing_tree', 'riding_mule', 'ice_climbing']}}
+{'res': {'input_path': 'general_video_classification_001.mp4', 'class_ids': array([  0, ..., 162], dtype=int32), 'scores': [0.91997, 0.07052, 0.00237, 0.00214, 0.00158], 'label_names': ['abseiling', 'rock_climbing', 'climbing_tree', 'riding_mule', 'ice_climbing']}}
 ```
 
-The explanation of the result parameters can refer to the result explanation in [2.2.2 Integration with Python Script](#222-integration-with-python-script).
+The explanation of the result parameters can refer to the result explanation in [2.2 Integration with Python Script](#22-integration-with-python-script).
+
 
 The visualization results are saved under `save_path`, and the visualization result for video classification is as follows:
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/video_classification/02.jpg" style="width: 70%">
@@ -110,6 +113,13 @@ In the above Python script, the following steps are executed:
 <td><code>None</code></td>
 </tr>
 <tr>
+<td><code>config</code></td>
+<td>Specific configuration information for the pipeline (if set simultaneously with the <code>pipeline</code>, it takes precedence over the <code>pipeline</code>, and the pipeline name must match the <code>pipeline</code>).
+</td>
+<td><code>dict[str, Any]</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
 <td><code>device</code></td>
 <td>The inference device for the pipeline. It supports specifying the specific card number of the GPU, such as "gpu:0", other hardware card numbers, such as "npu:0", and CPU as "cpu".</td>
 <td><code>str</code></td>
@@ -251,7 +261,7 @@ In the above Python script, the following steps are executed:
     - `scores`: `(List[float])` List of confidence scores for video classification
     - `label_names`: `(List[str])` List of categories for video classification
 
-- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_video_basename}.json`; if specified as a file, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, the `numpy.array` types will be converted to lists.
+- Calling the `save_to_json()` method will save the above content to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_video_basename}_res.json`; if specified as a file, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, the `numpy.array` types will be converted to lists.
 
 - Calling the `save_to_video()` method will save the visualization results to the specified `save_path`. If specified as a directory, the saved path will be `save_path/{your_video_basename}_res.{your_video_extension}`; if specified as a file, it will be saved directly to that file. (The production line usually contains multiple result videos, so it is not recommended to specify a specific file path directly, as multiple videos will be overwritten and only the last video will be retained)
 
@@ -302,7 +312,7 @@ for res in output:
 ## 3. Development Integration/Deployment
 If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
 
-If you need to apply the pipeline directly to your Python project, you can refer to the example code in [2.2 Python Script Integration](#22-python-script-integration).
+If you need to apply the pipeline directly to your Python project, you can refer to the example code in [2.2 Integration with Python Script](#22-integration-with-python-script).
 
 Additionally, PaddleX provides three other deployment methods, detailed as follows:
 
@@ -881,7 +891,7 @@ SubModules:
   VideoClassification:
     module_name: video_classification
     model_name: PP-TSMv2-LCNetV2_8frames_uniform
-    model_dir: null # 替换为微调后的视频分类模型权重路径
+    model_dir: null # Replace with the fine-tuned video classification model weights path
     batch_size: 1
     topk: 1
 
@@ -903,4 +913,6 @@ paddlex --pipeline video_classification \
     --device npu:0
 ```
 
+Of course, you can also specify the hardware device when calling `create_pipeline()` or `predict()` in a Python script.
+
 If you want to use the General Video Classification Production Line on a wider variety of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).

+ 13 - 7
docs/pipeline_usage/tutorials/video_pipelines/video_classification.md

@@ -44,7 +44,7 @@ PP-TSM是一种百度飞桨视觉团队自研的视频分类模型。该模型
 
 </table>
 
-<p><b>注:以上精度指标为 <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.md">K400</a> 验证集 Top1 Acc。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p></details>
+<p><b>注:以上精度指标为 <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.md">K400</a> 验证集 Top1 Acc。</b></p></details>
 
 ## 2. 快速开始
 
@@ -62,13 +62,13 @@ paddlex --pipeline video_classification \
     --save_path ./output \
     --device gpu:0
 ```
-相关的参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的参数说明。
+相关的参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的参数说明。
 
 运行后,会将结果打印到终端上,结果如下:
 ```bash
-{'res': {'input_path': 'general_video_classification_001.mp4', 'class_ids': array([  0, 278,  68, 272, 162], dtype=int32), 'scores': [0.91996, 0.07055, 0.00235, 0.00215, 0.00158], 'label_names': ['abseiling', 'rock_climbing', 'climbing_tree', 'riding_mule', 'ice_climbing']}}
+{'res': {'input_path': 'general_video_classification_001.mp4', 'class_ids': array([  0, ..., 162], dtype=int32), 'scores': [0.91997, 0.07052, 0.00237, 0.00214, 0.00158], 'label_names': ['abseiling', 'rock_climbing', 'climbing_tree', 'riding_mule', 'ice_climbing']}}
 ```
-运行结果参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的结果解释。
+运行结果参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的结果解释。
 
 
 可视化结果保存在`save_path`下,其中视频分类的可视化结果如下:
@@ -93,7 +93,7 @@ for res in output:
 
 在上述 Python 脚本中,执行了如下几个步骤:
 
-(1)通过 `create_pipeline()` 实例化 视频分类产线对象,具体参数说明如下:
+(1)通过 `create_pipeline()` 实例化视频分类产线对象,具体参数说明如下:
 
 <table>
 <thead>
@@ -112,6 +112,12 @@ for res in output:
 <td><code>None</code></td>
 </tr>
 <tr>
+<td><code>config</code></td>
+<td>产线具体的配置信息(如果和<code>pipeline</code>同时设置,优先级高于<code>pipeline</code>,且要求产线名和<code>pipeline</code>一致)。</td>
+<td><code>dict[str, Any]</code></td>
+<td><code>None</code></td>
+</tr>
+<tr>
 <td><code>device</code></td>
 <td>产线推理设备。支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。</td>
 <td><code>str</code></td>
@@ -255,8 +261,8 @@ for res in output:
     - `label_names`: `(List[str])` 视频分类的类别列表
 
 
-- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_video_basename}.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
-- 调用`save_to_video()` 方法会将可视化结果保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_video_basename}_res.{your_video_extension}`,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果视频,不建议直接指定为具体的文件路径,否则多张图会被覆盖,仅保留最后一个视频)
+- 调用`save_to_json()` 方法会将上述内容保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_video_basename}_res.json`,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的`numpy.array`类型转换为列表形式。
+- 调用`save_to_video()` 方法会将可视化结果保存到指定的`save_path`中,如果指定为目录,则保存的路径为`save_path/{your_video_basename}_res.{your_video_extension}`,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果视频,不建议直接指定为具体的文件路径,否则多个视频会被覆盖,仅保留最后一个视频)
 
 * 此外,也支持通过属性获取带结果的可视化视频和预测结果,具体如下:
 

+ 1 - 1
paddlex/configs/modules/formula_recognition/PP-FormulaNet-L.yaml

@@ -34,7 +34,7 @@ Export:
 
 Predict:
   batch_size: 1
-  model_dir: "output/best_accuracy/inference_pir"
+  model_dir: "output/best_accuracy/inference"
   input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png"
   kernel_option:
     run_mode: paddle

+ 1 - 1
paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml

@@ -34,7 +34,7 @@ Export:
 
 Predict:
   batch_size: 1
-  model_dir: "output/best_accuracy/inference_pir"
+  model_dir: "output/best_accuracy/inference"
   input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png"
   kernel_option:
     run_mode: paddle

+ 1 - 1
paddlex/configs/modules/formula_recognition/UniMERNet.yaml

@@ -34,7 +34,7 @@ Export:
 
 Predict:
   batch_size: 1
-  model_dir: "output/best_accuracy/inference_pir"
+  model_dir: "output/best_accuracy/inference"
   input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png"
   kernel_option:
     run_mode: paddle

+ 15 - 0
paddlex/modules/formula_recognition/dataset_checker/__init__.py

@@ -13,6 +13,7 @@
 # limitations under the License.
 
 
+from pathlib import Path
 from ...base import BaseDatasetChecker
 from .dataset_src import check, split_dataset, deep_analyse, convert
 
@@ -25,6 +26,20 @@ class FormulaRecDatasetChecker(BaseDatasetChecker):
     entities = MODELS
     sample_num = 10
 
+    def get_dataset_root(self, dataset_dir: str) -> str:
+        """find the dataset root dir
+
+        Args:
+            dataset_dir (str): the directory that contain dataset.
+
+        Returns:
+            str: the root directory of dataset.
+        """
+        anno_dirs = list(Path(dataset_dir).glob("**/train.txt"))
+        assert len(anno_dirs) == 1
+        dataset_dir = anno_dirs[0].parent.as_posix()
+        return dataset_dir
+
     def convert_dataset(self, src_dataset_dir: str) -> str:
         """convert the dataset from other type to specified type