Browse Source

Modify doc (#2889)

* modify doc and repair bug in formula pipe

* modify doc

* modify doc1

* modify doc
liuhongen1234567 10 months ago
parent
commit
472770fa34

+ 22 - 22
docs/module_usage/tutorials/ocr_modules/formula_recognition.en.md

@@ -13,14 +13,14 @@ The formula recognition module is a crucial component of OCR (Optical Character
 <table>
 <tr>
 <th>Model</th><th>Model Download Link</th>
-<th>Normed Edit Distance</th>
 <th>BLEU Score</th>
+<th>Normed Edit Distance</th>
 <th>ExpRate (%)</th>
 <th>Model Size (M)</th>
 <th>Description</th>
 </tr>
 <tr>
-<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/LaTeX_OCR_rec_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">Trained Model</a></td>
+<td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-FormulaNet-S_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">Trained Model</a></td>
 <td>0.8821</td>
 <td>0.0823</td>
 <td>40.01</td>
@@ -38,7 +38,7 @@ After installing the wheel package, a few lines of code can complete the inferen
 
 ```bash
 from paddlex import create_model
-model = create_model("LaTeX_OCR_rec")
+model = create_model("PP-FormulaNet-S")
 output = model.predict("general_formula_rec_001.png", batch_size=1)
 for res in output:
     res.print(json_format=False)
@@ -64,7 +64,7 @@ tar -xf ./dataset/ocr_rec_latexocr_dataset_example.tar -C ./dataset/
 A single command can complete data validation:
 
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
@@ -108,7 +108,7 @@ After executing the above command, PaddleX will validate the dataset and summari
   },
   &quot;dataset_path&quot;: &quot;./dataset/ocr_rec_latexocr_dataset_example&quot;,
   &quot;show_type&quot;: &quot;image&quot;,
-  &quot;dataset_type&quot;: &quot;LaTeXOCRDataset&quot;
+  &quot;dataset_type&quot;: &quot;FormulaRecDataset&quot;
 }
 </code></pre>
 <p>In the above validation results, <code>check_pass</code> being True indicates that the dataset format meets the requirements. Explanations for other indicators are as follows:
@@ -126,34 +126,34 @@ After completing the data verification, you can convert the dataset format and r
 <details><summary>👉 <b>Details of Format Conversion / Dataset Splitting (Click to Expand)</b></summary>
 
 <p><b>(1) Dataset Format Conversion</b></p>
-<p>The formula recognition supports converting <code>MSTextRecDataset</code> format datasets to <code>LaTeXOCRDataset</code> format ( <code>PKL</code> format ). The parameters for dataset format conversion can be set by modifying the fields under <code>CheckDataset</code> in the configuration file. Examples of some parameters in the configuration file are as follows:</p>
+<p>The formula recognition supports converting <code>FormulaRecDataset</code> format datasets to <code>LaTeXOCRDataset</code> format ( <code>PKL</code> format ). The parameters for dataset format conversion can be set by modifying the fields under <code>CheckDataset</code> in the configuration file. Examples of some parameters in the configuration file are as follows:</p>
 <ul>
 <li><code>CheckDataset</code>:</li>
 <li><code>convert</code>:</li>
-<li><code>enable</code>: Whether to perform dataset format conversion. Formula recognition supports converting <code>MSTextRecDataset</code> format datasets to <code>LaTeXOCRDataset</code> format, default is <code>True</code>;</li>
-<li><code>src_dataset_type</code>: If dataset format conversion is performed, the source dataset format needs to be set, default is <code>MSTextRecDataset</code>;</li>
+<li><code>enable</code>: Whether to perform dataset format conversion. Formula recognition supports converting <code>FormulaRecDataset</code> format datasets to <code>LaTeXOCRDataset</code> format, default is <code>True</code>;</li>
+<li><code>src_dataset_type</code>: If dataset format conversion is performed, the source dataset format needs to be set, default is <code>FormulaRecDataset</code>;</li>
 </ul>
-<p>For example, if you want to convert a <code>MSTextRecDataset</code> format dataset to <code>LaTeXOCRDataset</code> format, you need to modify the configuration file as follows:</p>
+<p>For example, if you want to convert a <code>FormulaRecDataset</code> format dataset to <code>LaTeXOCRDataset</code> format, you need to modify the configuration file as follows:</p>
 <pre><code class="language-bash">......
 CheckDataset:
   ......
   convert:
     enable: True
-    src_dataset_type: MSTextRecDataset
+    src_dataset_type: FormulaRecDataset
   ......
 </code></pre>
 <p>Then execute the command:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 </code></pre>
 <p>After the data conversion is executed, the original annotation files will be renamed to <code>xxx.bak</code> in the original path.</p>
 <p>The above parameters also support being set by appending command line arguments:</p>
-<pre><code class="language-bash">python main.py -c  paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c  paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example \
     -o CheckDataset.convert.enable=True \
-    -o CheckDataset.convert.src_dataset_type=MSTextRecDataset
+    -o CheckDataset.convert.src_dataset_type=FormulaRecDataset
 </code></pre>
 <p><b>(2) Dataset Splitting</b></p>
 <p>The parameters for dataset splitting can be set by modifying the fields under <code>CheckDataset</code> in the configuration file. Examples of some parameters in the configuration file are as follows:</p>
@@ -174,13 +174,13 @@ CheckDataset:
   ......
 </code></pre>
 <p>Then execute the command:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 </code></pre>
 <p>After the data splitting is executed, the original annotation files will be renamed to <code>xxx.bak</code> in the original path.</p>
 <p>The above parameters also support being set by appending command line arguments:</p>
-<pre><code class="language-bash">python main.py -c  paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c  paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example \
     -o CheckDataset.split.enable=True \
@@ -189,16 +189,16 @@ CheckDataset:
 </code></pre></details>
 
 ### 4.2 Model Training
-Model training can be completed with a single command, taking the training of the formula recognition model LaTeX_OCR_rec as an example:
+Model training can be completed with a single command, taking the training of the formula recognition model PP-FormulaNet-S as an example:
 
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml  \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml  \
     -o Global.mode=train \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
 The following steps are required:
 
-* Specify the `.yaml` configuration file path for the model (here it is `LaTeX_OCR_rec.yaml`,When training other models, you need to specify the corresponding configuration files. The relationship between the model and configuration files can be found in the [PaddleX Model List (CPU/GPU)](../../../support_list/models_list.en.md))
+* Specify the `.yaml` configuration file path for the model (here it is `PP-FormulaNet-S.yaml`,When training other models, you need to specify the corresponding configuration files. The relationship between the model and configuration files can be found in the [PaddleX Model List (CPU/GPU)](../../../support_list/models_list.en.md))
 * Set the mode to model training: `-o Global.mode=train`
 * Specify the path to the training dataset: `-o Global.dataset_dir`.
 Other related parameters can be set by modifying the `Global` and `Train` fields in the `.yaml` configuration file, or adjusted by appending parameters in the command line. For example, to specify training on the first two GPUs: `-o Global.device=gpu:0,1`; to set the number of training epochs to 10: `-o Train.epochs_iters=10`. For more modifiable parameters and their detailed explanations, refer to the configuration file instructions for the corresponding task module of the model [PaddleX Common Configuration File Parameters](../../instructions/config_parameters_common.en.md).
@@ -224,13 +224,13 @@ Other related parameters can be set by modifying the `Global` and `Train` fields
 After completing model training, you can evaluate the specified model weight file on the validation set to verify the model's accuracy. Using PaddleX for model evaluation can be done with a single command:
 
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml  \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml  \
     -o Global.mode=evaluate \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
 Similar to model training, the following steps are required:
 
-* Specify the `.yaml` configuration file path for the model (here it is `LaTeX_OCR_rec.yaml`)
+* Specify the `.yaml` configuration file path for the model (here it is `PP-FormulaNet-S.yaml`)
 * Set the mode to model evaluation: `-o Global.mode=evaluate`
 * Specify the path to the validation dataset: `-o Global.dataset_dir`.
 Other related parameters can be set by modifying the `Global` and `Evaluate` fields in the `.yaml` configuration file, detailed instructions can be found in [PaddleX Common Configuration File Parameters](../../instructions/config_parameters_common.en.md).
@@ -248,14 +248,14 @@ After completing model training and evaluation, you can use the trained model we
 #### 4.4.1 Model Inference
 To perform inference prediction through the command line, simply use the following command. Before running the following code, please download the [demo image](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png) to your local machine.
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=predict \
     -o Predict.model_dir="./output/best_accuracy/inference" \
     -o Predict.input="general_formula_rec_001.png"
 ```
 Similar to model training and evaluation, the following steps are required:
 
-* Specify the `.yaml` configuration file path for the model (here it is `LaTeX_OCR_rec.yaml`)
+* Specify the `.yaml` configuration file path for the model (here it is `PP-FormulaNet-S.yaml`)
 * Set the mode to model inference prediction: `-o Global.mode=predict`
 * Specify the model weights path: `-o Predict.model_dir="./output/best_accuracy/inference"`
 * Specify the input data path: `-o Predict.input="..."`.

+ 235 - 25
docs/module_usage/tutorials/ocr_modules/formula_recognition.md

@@ -9,18 +9,47 @@ comments: true
 
 ## 二、支持模型列表
 
+<table>
+<tr>
+<th>模型</th><th>模型下载链接</th>
+<th>Avg-BLEU</th>
+<th>GPU推理耗时 (ms)</th>
+<th>模型存储大小 (M)</th>
+<th>介绍</th>
+</tr>
+<td>UniMERNet</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/UniMERNet_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/UniMERNet_pretrained.pdparams">训练模型</a></td>
+<td>0.8613</td>
+<td>2266.96</td>
+<td>1.4 G</td>
+<td>UniMERNet是由上海AI Lab研发的一款公式识别模型。该模型采用Donut Swin作为编码器,MBartDecoder作为解码器,并通过在包含简单公式、复杂公式、扫描捕捉公式和手写公式在内的一百万数据集上进行训练,大幅提升了模型对真实场景公式的识别准确率</td>
+<tr>
+<td>PP-FormulaNet-S</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-S_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-S_pretrained.pdparams">训练模型</a></td>
+<td>0.8712</td>
+<td>202.25</td>
+<td>167.9 M</td>
+<td rowspan="2">PP-FormulaNet 是由百度飞桨视觉团队开发的一款先进的公式识别模型。PP-FormulaNet-S 版本采用了 PP-HGNetV2-B4 作为其骨干网络,通过并行掩码和模型蒸馏等技术,大幅提升了模型的推理速度,同时保持了较高的识别精度,特别适合对推理速度有较高要求的应用场景。而 PP-FormulaNet-L 版本则基于 Vary_VIT_B 作为骨干网络,并在大规模公式数据集上进行了深入训练,在复杂公式的识别方面,相较于PP-FormulaNet-S表现出显著的提升。 </td>
+
+</tr>
+<td>PP-FormulaNet-L</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-FormulaNet-L_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-FormulaNet-L_pretrained.pdparams">训练模型</a></td>
+<td>0.9213</td>
+<td>1976.52</td>
+<td>535.2 M</td>
+</table>
+
+<b>注:以上精度指标测量自 PaddleX 内部自建公式识别测试集。所有模型 GPU 推理耗时基于 Tesla V100 GPUs 机器,精度类型为 FP32</b>
+
 
 <table>
 <tr>
 <th>模型</th><th>模型下载链接</th>
-<th>normed edit distance</th>
 <th>BLEU score</th>
+<th>normed edit distance</th>
 <th>ExpRate (%)</th>
 <th>模型存储大小 (M)</th>
 <th>介绍</th>
 </tr>
 <tr>
-<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/LaTeX_OCR_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">训练模型</a></td>
+<td>LaTeX_OCR_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/LaTeX_OCR_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/LaTeX_OCR_rec_pretrained.pdparams">训练模型</a></td>
 <td>0.8821</td>
 <td>0.0823</td>
 <td>40.01</td>
@@ -31,6 +60,8 @@ comments: true
 
 <b>注:以上精度指标测量自 LaTeX-OCR公式识别测试集。</b>
 
+
+
 ## 三、快速集成
 > ❗ 在快速集成前,请先安装 PaddleX 的 wheel 包,详细请参考 [PaddleX本地安装教程](../../../installation/installation.md)
 
@@ -38,12 +69,184 @@ wheel 包的安装后,几行代码即可完成公式识别模块的推理,
 
 ```bash
 from paddlex import create_model
-model = create_model("LaTeX_OCR_rec")
-output = model.predict("general_formula_rec_001.png", batch_size=1)
+model = create_model(model_name="PP-FormulaNet-S")
+output = model.predict(input="general_formula_rec_001.png", batch_size=1)
 for res in output:
-    res.print(json_format=False)
-    res.save_to_json("./output/res.json")
+    res.print()
+    res.save_to_img(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
 ```
+运行后,得到的结果为:
+```bash
+{'res': {'input_path': 'general_formula_rec_001.png', 'rec_formula': '\\zeta_{0}(\\nu)=-{\\frac{\\nu\\varrho^{-2\\nu}}{\\pi}}\\int_{\\mu}^{\\infty}d\\omega\\int_{C_{+}}d z{\\frac{2z^{2}}{(z^{2}+\\omega^{2})^{\\nu+1}}}\\ \\ {vec\\Psi}(\\omega;z)e^{i\\epsilon z}\\quad,'}}
+```
+运行结果参数含义如下:
+- `input_path`:表示输入待预测公式图像的路径
+- `rec_formula`:表示公式图像的预测LaTeX源码
+
+
+可视化图片如下:
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/formula_recog/general_formula_rec_001_res.png">
+
+<b> 注:如果您需要对公式识别产线进行可视化,需要运行如下命令来对LaTeX渲染环境进行安装:</b>
+```bash
+sudo apt-get update
+sudo apt-get install texlive texlive-latex-base texlive-latex-extra -y
+```
+
+相关方法、参数等说明如下:
+
+* `create_model`实例化公式识别模型(此处以`PP-FormulaNet-S`为例),具体说明如下:
+<table>
+<thead>
+<tr>
+<th>参数</th>
+<th>参数说明</th>
+<th>参数类型</th>
+<th>可选项</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tr>
+<td><code>model_name</code></td>
+<td>模型名称</td>
+<td><code>str</code></td>
+<td>所有PaddleX支持的模型名称</td>
+<td>无</td>
+</tr>
+<tr>
+<td><code>model_dir</code></td>
+<td>模型存储路径</td>
+<td><code>str</code></td>
+<td>无</td>
+<td>无</td>
+</tr>
+</table>
+
+* 其中,`model_name` 必须指定,指定 `model_name` 后,默认使用 PaddleX 内置的模型参数,在此基础上,指定 `model_dir` 时,使用用户自定义的模型。
+
+* 调用文本识别模型的 `predict()` 方法进行推理预测,`predict()` 方法参数有 `input` 和 `batch_size`,具体说明如下:
+
+<table>
+<thead>
+<tr>
+<th>参数</th>
+<th>参数说明</th>
+<th>参数类型</th>
+<th>可选项</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tr>
+<td><code>input</code></td>
+<td>待预测数据,支持多种输入类型</td>
+<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
+<td>
+<ul>
+  <li><b>Python变量</b>,如<code>numpy.ndarray</code>表示的图像数据</li>
+  <li><b>文件路径</b>,如图像文件的本地路径:<code>/root/data/img.jpg</code></li>
+  <li><b>URL链接</b>,如图像文件的网络URL:<a href = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png">示例</a></li>
+  <li><b>本地目录</b>,该目录下需包含待预测数据文件,如本地路径:<code>/root/data/</code></li>
+  <li><b>字典</b>,字典的<code>key</code>需与具体任务对应,如图像分类任务对应<code>\"img\"</code>,字典的<code>val</code>支持上述类型数据,例如:<code>{\"img\": \"/root/data1\"}</code></li>
+  <li><b>列表</b>,列表元素需为上述类型数据,如<code>[numpy.ndarray, numpy.ndarray]</code>,<code>[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]</code>,<code>[\"/root/data1\", \"/root/data2\"]</code>,<code>[{\"img\": \"/root/data1\"}, {\"img\": \"/root/data2/img.jpg\"}]</code></li>
+</ul>
+</td>
+<td>无</td>
+</tr>
+<tr>
+<td><code>batch_size</code></td>
+<td>批大小</td>
+<td><code>int</code></td>
+<td>任意整数</td>
+<td>1</td>
+</tr>
+</table>
+
+* 对预测结果进行处理,每个样本的预测结果均为`dict`类型,且支持打印、保存为图片、保存为`json`文件的操作:
+
+<table>
+<thead>
+<tr>
+<th>方法</th>
+<th>方法说明</th>
+<th>参数</th>
+<th>参数类型</th>
+<th>参数说明</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tr>
+<td rowspan = "3"><code>print()</code></td>
+<td rowspan = "3">打印结果到终端</td>
+<td><code>format_json</code></td>
+<td><code>bool</code></td>
+<td>是否对输出内容进行使用 <code>JSON</code> 缩进格式化</td>
+<td><code>True</code></td>
+</tr>
+<tr>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>指定缩进级别,以美化输出的 <code>JSON</code> 数据,使其更具可读性,仅当 <code>format_json</code> 为 <code>True</code> 时有效</td>
+<td>4</td>
+</tr>
+<tr>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时,所有非 <code>ASCII</code> 字符将被转义;<code>False</code> 则保留原始字符,仅当<code>format_json</code>为<code>True</code>时有效</td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td rowspan = "3"><code>save_to_json()</code></td>
+<td rowspan = "3">将结果保存为json格式的文件</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致</td>
+<td>无</td>
+</tr>
+<tr>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>指定缩进级别,以美化输出的 <code>JSON</code> 数据,使其更具可读性,仅当 <code>format_json</code> 为 <code>True</code> 时有效</td>
+<td>4</td>
+</tr>
+<tr>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时,所有非 <code>ASCII</code> 字符将被转义;<code>False</code> 则保留原始字符,仅当<code>format_json</code>为<code>True</code>时有效</td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td><code>save_to_img()</code></td>
+<td>将结果保存为图像格式的文件</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致</td>
+<td>无</td>
+</tr>
+</table>
+
+* 此外,也支持通过属性获取带结果的可视化图像和预测结果,具体如下:
+
+<table>
+<thead>
+<tr>
+<th>属性</th>
+<th>属性说明</th>
+</tr>
+</thead>
+<tr>
+<td rowspan = "1"><code>json</code></td>
+<td rowspan = "1">获取预测的<code>json</code>格式的结果</td>
+</tr>
+<tr>
+<td rowspan = "1"><code>img</code></td>
+<td rowspan = "1">获取格式为<code>dict</code>的可视化图像</td>
+</tr>
+
+</table>
+
+
 关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。
 
 ## 四、二次开发
@@ -63,7 +266,7 @@ tar -xf ./dataset/ocr_rec_latexocr_dataset_example.tar -C ./dataset/
 一行命令即可完成数据校验:
 
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
@@ -108,7 +311,7 @@ python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
   },
   &quot;dataset_path&quot;: &quot;./dataset/ocr_rec_latexocr_dataset_example&quot;,
   &quot;show_type&quot;: &quot;image&quot;,
-  &quot;dataset_type&quot;: &quot;LaTeXOCRDataset&quot;
+  &quot;dataset_type&quot;: &quot;FormulaRecDataset&quot;
 }
 </code></pre>
 <p>上述校验结果中,check_pass 为 True 表示数据集格式符合要求,其他部分指标的说明如下:</p>
@@ -127,34 +330,34 @@ python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
 <details><summary>👉 <b>格式转换/数据集划分详情(点击展开)</b></summary>
 
 <p><b>(1)数据集格式转换</b></p>
-<p>公式识别支持 <code>MSTextRecDataset</code>格式的数据集转换为 <code>LaTeXOCRDataset</code>格式(<code>PKL</code>格式),数据集格式转换的参数可以通过修改配置文件中 <code>CheckDataset</code> 下的字段进行设置,配置文件中部分参数的示例说明如下:</p>
+<p>公式识别支持 <code>FormulaRecDataset</code>格式的数据集转换为 <code>LaTeXOCRDataset</code>格式(<code>PKL</code>格式),数据集格式转换的参数可以通过修改配置文件中 <code>CheckDataset</code> 下的字段进行设置,配置文件中部分参数的示例说明如下:</p>
 <ul>
 <li><code>CheckDataset</code>:</li>
 <li><code>convert</code>:</li>
-<li><code>enable</code>: 是否进行数据集格式转换,公式识别支持 <code>MSTextRecDataset</code>格式的数据集转换为 <code>LaTeXOCRDataset</code>格式,默认为 <code>True</code>;</li>
-<li><code>src_dataset_type</code>: 如果进行数据集格式转换,则需设置源数据集格式,默认为 <code>MSTextRecDataset</code>;</li>
+<li><code>enable</code>: 是否进行数据集格式转换,公式识别支持 <code>FormulaRecDataset</code>格式的数据集转换为 <code>LaTeXOCRDataset</code>格式,默认为 <code>True</code>;</li>
+<li><code>src_dataset_type</code>: 如果进行数据集格式转换,则需设置源数据集格式,默认为 <code>FormulaRecDataset</code>;</li>
 </ul>
-<p>例如,您想将 <code>MSTextRecDataset</code>格式的数据集转换为 <code>LaTeXOCRDataset</code>格式,则需将配置文件修改为:</p>
+<p>例如,您想将 <code>FormulaRecDataset</code>格式的数据集转换为 <code>LaTeXOCRDataset</code>格式,则需将配置文件修改为:</p>
 <pre><code class="language-bash">......
 CheckDataset:
   ......
   convert:
     enable: True
-    src_dataset_type: MSTextRecDataset
+    src_dataset_type: FormulaRecDataset
   ......
 </code></pre>
 <p>随后执行命令:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 </code></pre>
 <p>数据转换执行之后,原有标注文件会被在原路径下重命名为 <code>xxx.bak</code>。</p>
 <p>以上参数同样支持通过追加命令行参数的方式进行设置:</p>
-<pre><code class="language-bash">python main.py -c  paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c  paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example \
     -o CheckDataset.convert.enable=True \
-    -o CheckDataset.convert.src_dataset_type=MSTextRecDataset
+    -o CheckDataset.convert.src_dataset_type=FormulaRecDataset
 </code></pre>
 <p><b>(2)数据集划分</b></p>
 <p>数据集划分的参数可以通过修改配置文件中 <code>CheckDataset</code> 下的字段进行设置,配置文件中部分参数的示例说明如下:</p>
@@ -175,13 +378,13 @@ CheckDataset:
   ......
 </code></pre>
 <p>随后执行命令:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 </code></pre>
 <p>数据划分执行之后,原有标注文件会被在原路径下重命名为 <code>xxx.bak</code>。</p>
 <p>以上参数同样支持通过追加命令行参数的方式进行设置:</p>
-<pre><code class="language-bash">python main.py -c  paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+<pre><code class="language-bash">python main.py -c  paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example \
     -o CheckDataset.split.enable=True \
@@ -190,19 +393,26 @@ CheckDataset:
 </code></pre></details>
 
 ### 4.2 模型训练
-一条命令即可完成模型的训练,以此处公式识别模型 LaTeX_OCR_rec 的训练为例:
+一条命令即可完成模型的训练,以此处公式识别模型 PP-FormulaNet-S 的训练为例:
 
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml  \
+FLAGS_json_format_model=1 python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml  \
     -o Global.mode=train \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
 需要如下几步:
 
-* 指定模型的`.yaml` 配置文件路径(此处为`LaTeX_OCR_rec.yaml`,训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md))
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-FormulaNet-S.yaml`,训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md))
 * 指定模式为模型训练:`-o Global.mode=train`
 * 指定训练数据集路径:`-o Global.dataset_dir`
 其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Train`下的字段来进行设置,也可以通过在命令行中追加参数来进行调整。如指定前 2 卡 gpu 训练:`-o Global.device=gpu:0,1`;设置训练轮次数为 10:`-o Train.epochs_iters=10`。更多可修改的参数及其详细解释,可以查阅模型对应任务模块的配置文件说明[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+* 除 LaTeX_OCR_rec外, 公式识别模型只支持导出json格式的模型,因此训练时需要设置参数`FLAGS_json_format_model=1`。
+* 对于 PP-FormulaNet-S、PP-FormulaNet-L、UniMERNet 模型,在训练还需要安装额外的Linux包,具体命令如下:
+```bash
+sudo apt-get update
+sudo apt-get install libmagickwand-dev
+python -m pip install Wand
+```
 
 <details><summary>👉 <b>更多说明(点击展开)</b></summary>
 
@@ -224,13 +434,13 @@ python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml  \
 在完成模型训练后,可以对指定的模型权重文件在验证集上进行评估,验证模型精度。使用 PaddleX 进行模型评估,一条命令即可完成模型的评估:
 
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml  \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml  \
     -o Global.mode=evaluate \
     -o Global.dataset_dir=./dataset/ocr_rec_latexocr_dataset_example
 ```
 与模型训练类似,需要如下几步:
 
-* 指定模型的`.yaml` 配置文件路径(此处为`LaTeX_OCR_rec.yaml`)
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-FormulaNet-S.yaml`)
 * 指定模式为模型评估:`-o Global.mode=evaluate`
 * 指定验证数据集路径:`-o Global.dataset_dir`
 其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Evaluate`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
@@ -247,14 +457,14 @@ python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml  \
 
 * 通过命令行的方式进行推理预测,只需如下一条命令。运行以下代码前,请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_formula_rec_001.png)到本地。
 ```bash
-python main.py -c paddlex/configs/formula_recognition/LaTeX_OCR_rec.yaml \
+python main.py -c paddlex/configs/modules/formula_recognition/PP-FormulaNet-S.yaml \
     -o Global.mode=predict \
     -o Predict.model_dir="./output/best_accuracy/inference" \
     -o Predict.input="general_formula_rec_001.png"
 ```
 与模型训练和评估类似,需要如下几步:
 
-* 指定模型的`.yaml` 配置文件路径(此处为`LaTeX_OCR_rec.yaml`)
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-FormulaNet-S.yaml`)
 * 指定模式为模型推理预测:`-o Global.mode=predict`
 * 指定模型权重路径:`-o Predict.model_dir="./output/best_accuracy/inference"`
 * 指定输入数据路径:`-o Predict.input="..."`

+ 39 - 39
docs/module_usage/tutorials/ocr_modules/text_recognition.md

@@ -21,15 +21,15 @@ comments: true
 <tr>
 <td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>81.53</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>74.7 M</td>
 <td>PP-OCRv4_server_rec_doc是在PP-OCRv4_server_rec的基础上,在更多中文文档数据和PP-OCR训练数据的混合数据训练而成,增加了部分繁体字、日文、特殊字符的识别能力,可支持识别的字符为1.5万+,除文档相关的文字识别能力提升外,也同时提升了通用文字的识别能力</td>
 </tr>
 <tr>
 <td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">训练模型</a></td>
-<td>78.20</td>
+<td>78.74</td>
 <td>7.95018</td>
 <td>46.7868</td>
 <td>10.6 M</td>
@@ -37,7 +37,7 @@ PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a><
 </tr>
 <tr>
 <td>PP-OCRv4_server_rec </td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_server_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">训练模型</a></td>
-<td>79.20</td>
+<td>80.61 </td>
 <td>7.19439</td>
 <td>140.179</td>
 <td>71.2 M</td>
@@ -46,10 +46,10 @@ PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a><
 <tr>
 <td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>70.39</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>6.8 M</td>
 <td>基于PP-OCRv4识别模型训练得到的超轻量英文识别模型,支持英文、数字识别</td>
 </tr>
 </table>
@@ -74,15 +74,15 @@ en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 <tr>
 <td>PP-OCRv4_server_rec_doc</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>81.53</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>74.7 M</td>
 <td>PP-OCRv4_server_rec_doc是在PP-OCRv4_server_rec的基础上,在更多中文文档数据和PP-OCR训练数据的混合数据训练而成,增加了部分繁体字、日文、特殊字符的识别能力,可支持识别的字符为1.5万+,除文档相关的文字识别能力提升外,也同时提升了通用文字的识别能力</td>
 </tr>
 <tr>
 <td>PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_mobile_rec_pretrained.pdparams">训练模型</a></td>
-<td>78.20</td>
+<td>78.74</td>
 <td>7.95018</td>
 <td>46.7868</td>
 <td>10.6 M</td>
@@ -90,7 +90,7 @@ PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a><
 </tr>
 <tr>
 <td>PP-OCRv4_server_rec </td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-OCRv4_server_rec_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_pretrained.pdparams">训练模型</a></td>
-<td>79.20</td>
+<td>80.61 </td>
 <td>7.19439</td>
 <td>140.179</td>
 <td>71.2 M</td>
@@ -99,15 +99,15 @@ PP-OCRv4_server_rec_doc_infer.tar">推理模型</a>/<a href="">训练模型</a><
 <tr>
 <td>PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>72.96</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>9.2 M</td>
 <td>PP-OCRv3的轻量级识别模型,推理效率高,可以部署在包含端侧设备的多种硬件设备中</td>
 </tr>
 </table>
 
-<p><b>注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 1.1w 张图片。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
+<p><b>注:以上精度指标的评估集是 PaddleOCR 自建的中文数据集,覆盖街景、网图、文档、手写多个场景,其中文本识别包含 8367 张图片。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 
 <table>
 <tr>
@@ -166,19 +166,19 @@ SVTRv2 是一种由复旦大学视觉与学习实验室(FVL)的OpenOCR团队
 <tr>
 <td>en_PP-OCRv4_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 en_PP-OCRv4_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td> 70.39</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>6.8 M</td>
 <td>基于PP-OCRv4识别模型训练得到的超轻量英文识别模型,支持英文、数字识别</td>
 </tr>
 <tr>
 <td>en_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>70.69</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>7.8 M </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量英文识别模型,支持英文、数字识别</td>
 </tr>
 </table>
@@ -197,95 +197,95 @@ en_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></
 <tr>
 <td>korean_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 korean_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>60.21</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>8.6 M</td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量韩文识别模型,支持韩文、数字识别</td>
 </tr>
 <tr>
 <td>japan_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 japan_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>45.69</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>8.8 M </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量日文识别模型,支持日文、数字识别</td>
 </tr>
 <tr>
 <td>chinese_cht_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 chinese_cht_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>82.06</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>9.7 M </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量繁体中文识别模型,支持繁体中文、数字识别</td>
 </tr>
 <tr>
 <td>te_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 te_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>95.88</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>7.8 M </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量泰卢固文识别模型,支持泰卢固文、数字识别</td>
 </tr>
 <tr>
 <td>ka_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 ka_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>96.96</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>8.0 M </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量卡纳达文识别模型,支持卡纳达文、数字识别</td>
 </tr>
 <tr>
 <td>ta_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 ta_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>76.83</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>8.0 M </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量泰米尔文识别模型,支持泰米尔文、数字识别</td>
 </tr>
 <tr>
 <td>latin_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 latin_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>76.93</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>7.8 M</td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量拉丁文识别模型,支持拉丁文、数字识别</td>
 </tr>
 <tr>
 <td>arabic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 arabic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>73.55</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>7.8 M</td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量阿拉伯字母识别模型,支持阿拉伯字母、数字识别</td>
 </tr>
 <tr>
 <td>cyrillic_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 cyrillic_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>94.28</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>7.9 M  </td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量斯拉夫字母识别模型,支持斯拉夫字母、数字识别</td>
 </tr>
 <tr>
 <td>devanagari_PP-OCRv3_mobile_rec</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/\
 devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="">训练模型</a></td>
+<td>96.44</td>
 <td></td>
 <td></td>
-<td></td>
-<td></td>
+<td>7.9 M</td>
 <td>基于PP-OCRv3识别模型训练得到的超轻量梵文字母识别模型,支持梵文字母、数字识别</td>
 </tr>
 </table>
-
+<p><b>注:以上精度指标的评估集是 PaddleX 自建的多语种数据集。 所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p>
 </details>
 
 ## 三、快速集成
@@ -333,8 +333,8 @@ for res in output:
 <td><code>model_name</code></td>
 <td>模型名称</td>
 <td><code>str</code></td>
+<td>所有PaddleX支持的模型名称</td>
 <td>无</td>
-<td><code>PP-OCRv4_mobile_rec</code></td>
 </tr>
 <tr>
 <td><code>model_dir</code></td>

+ 12 - 12
docs/module_usage/tutorials/video_modules/video_classification.en.md

@@ -27,7 +27,7 @@ PP-TSM is a video classification model developed by Baidu PaddlePaddle's Vision
 </tr>
 
 <tr>
-<td>PPTSMv2_LCNet_k400_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PPTSMv2_LCNet_k400_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSMv2_LCNet_k400_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
+<td>PP-TSMv2-LCNetV2_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-TSMv2-LCNetV2_8frames_uniform_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSMv2-LCNetV2_8frames_uniform_pretrained.pdparams">Trained Model</a></td>
 <td>71.71</td>
 <td>22.5 M</td>
 <td rowspan="2">PP-TSMv2 is a lightweight video classification model optimized based on the CPU-oriented model PP-LCNetV2. It undergoes model tuning in seven aspects: backbone network and pre-trained model selection, data augmentation, TSM module tuning, input frame number optimization, decoding speed optimization, DML distillation, and LTA module. Under the center crop evaluation method, it achieves an accuracy of 75.16%, with an inference speed of only 456ms on the CPU for a 10-second video input.</td>
@@ -49,7 +49,7 @@ After installing the wheel package, you can complete video classification module
 
 ```python
 from paddlex import create_model
-model = create_model("PPTSMv2_LCNet_k400_8frames_uniform")
+model = create_model("PP-TSMv2-LCNetV2_8frames_uniform")
 output = model.predict("general_video_classification_001.mp4", batch_size=1)
 for res in output:
     res.print(json_format=False)
@@ -75,7 +75,7 @@ tar -xf ./dataset/k400_examples.tar -C ./dataset/
 One command is all you need to complete data validation:
 
 ```bash
-python main.py -c paddlex/configs/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+python main.py -c paddlex/configs/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/k400_examples
 ```
@@ -160,13 +160,13 @@ CheckDataset:
   ......
 </code></pre>
 <p>Then execute the command:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/k400_examples
 </code></pre>
 <p>After the data splitting is executed, the original annotation files will be renamed to <code>xxx.bak</code> in the original path.</p>
 <p>These parameters also support being set through appending command line arguments:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/k400_examples \
     -o CheckDataset.split.enable=True \
@@ -175,16 +175,16 @@ CheckDataset:
 </code></pre></details>
 
 ### 4.2 Model Training
-A single command can complete the model training. Taking the training of the video classification model PPTSMv2_LCNet_k400_8frames_uniform as an example:
+A single command can complete the model training. Taking the training of the video classification model PP-TSMv2-LCNetV2_8frames_uniform as an example:
 ```
-python main.py -c paddlex/configs/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml  \
+python main.py -c paddlex/configs/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml  \
     -o Global.mode=train \
     -o Global.dataset_dir=./dataset/k400_examples
 ```
 
 the following steps are required:
 
-* Specify the path of the model's `.yaml` configuration file (here it is `PPTSMv2_LCNet_k400_8frames_uniform.yaml`. When training other models, you need to specify the corresponding configuration files. The relationship between the model and configuration files can be found in the [PaddleX Model List (CPU/GPU)](../../../support_list/models_list.en.md))
+* Specify the path of the model's `.yaml` configuration file (here it is `PP-TSMv2-LCNetV2_8frames_uniform.yaml`. When training other models, you need to specify the corresponding configuration files. The relationship between the model and configuration files can be found in the [PaddleX Model List (CPU/GPU)](../../../support_list/models_list.en.md))
 * Specify the mode as model training: `-o Global.mode=train`
 * Specify the path of the training dataset: `-o Global.dataset_dir`. Other related parameters can be set by modifying the fields under `Global` and `Train` in the `.yaml` configuration file, or adjusted by appending parameters in the command line. For example, to specify training on the first 2 GPUs: `-o Global.device=gpu:0,1`; to set the number of training epochs to 10: `-o Train.epochs_iters=10`. For more modifiable parameters and their detailed explanations, refer to the configuration file parameter instructions for the corresponding task module of the model [PaddleX Common Model Configuration File Parameters](../../instructions/config_parameters_common.en.md).
 
@@ -208,13 +208,13 @@ the following steps are required:
 ## <b>4.3 Model Evaluation</b>
 After completing model training, you can evaluate the specified model weight file on the validation set to verify the model accuracy. Using PaddleX for model evaluation, a single command can complete the model evaluation:
 ```bash
-python main.py -c  paddlex/configs/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml  \
+python main.py -c  paddlex/configs/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml  \
     -o Global.mode=evaluate \
     -o Global.dataset_dir=./dataset/k400_examples
 ```
 Similar to model training, the following steps are required:
 
-* Specify the path of the model's `.yaml` configuration file (here it is `PPTSMv2_LCNet_k400_8frames_uniform.yaml`)
+* Specify the path of the model's `.yaml` configuration file (here it is `PP-TSMv2-LCNetV2_8frames_uniform.yaml`)
 * Specify the mode as model evaluation: `-o Global.mode=evaluate`
 * Specify the path of the validation dataset: `-o Global.dataset_dir`. Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration. Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.en.md).
 
@@ -230,14 +230,14 @@ After completing model training and evaluation, you can use the trained model we
 To perform inference prediction through the command line, simply use the following command. Before running the following code, please download the [demo video](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/general_video_classification_001.mp4) to your local machine.
 
 ```bash
-python main.py -c paddlex/configs/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+python main.py -c paddlex/configs/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=predict \
     -o Predict.model_dir="./output/best_model/inference" \
     -o Predict.input="general_video_classification_001.mp4"
 ```
 Similar to model training and evaluation, the following steps are required:
 
-* Specify the `.yaml` configuration file path for the model (here it is `PPTSMv2_LCNet_k400_8frames_uniform.yaml`)
+* Specify the `.yaml` configuration file path for the model (here it is `PP-TSMv2-LCNetV2_8frames_uniform.yaml`)
 * Specify the mode as model inference prediction: `-o Global.mode=predict`
 * Specify the model weight path: `-o Predict.model_dir="./output/best_model/inference"`
 * Specify the input data path: `-o Predict.input="..."`

+ 191 - 17
docs/module_usage/tutorials/video_modules/video_classification.md

@@ -27,7 +27,7 @@ PP-TSM是一种百度飞桨视觉团队自研的视频分类模型。该模型
 </tr>
 
 <tr>
-<td>PPTSMv2_LCNet_k400_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PPTSMv2_LCNet_k400_8frames_uniform_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PPTSMv2_LCNet_k400_8frames_uniform_pretrained.pdparams">训练模型</a></td>
+<td>PP-TSMv2-LCNetV2_8frames_uniform</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/PP-TSMv2-LCNetV2_8frames_uniform_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TSMv2-LCNetV2_8frames_uniform_pretrained.pdparams">训练模型</a></td>
 <td>71.71</td>
 <td>22.5 M</td>
 <td rowspan="2">PP-TSMv2是轻量化的视频分类模型,基于CPU端模型PP-LCNetV2进行优化,从骨干网络与预训练模型选择、数据增强、tsm模块调优、输入帧数优化、解码速度优化、DML蒸馏、LTA模块等7个方面进行模型调优,在中心采样评估方式下,精度达到75.16%,输入10s视频在CPU端的推理速度仅需456ms。</td>
@@ -42,7 +42,8 @@ PP-TSM是一种百度飞桨视觉团队自研的视频分类模型。该模型
 
 
 
-<p><b>注:以上精度指标为 <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.md">K400</a> 验证集 Top1 Acc。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p></details>
+<p><b>注:以上精度指标为 <a href="https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/dataset/k400.md">K400</a> 验证集 Top1 Acc。</b></p></details>
+
 
 ## 三、快速集成
 > ❗ 在快速集成前,请先安装 PaddleX 的 wheel 包,详细请参考 [PaddleX本地安装教程](../../../installation/installation.md)。
@@ -51,14 +52,187 @@ PP-TSM是一种百度飞桨视觉团队自研的视频分类模型。该模型
 
 ```python
 from paddlex import create_model
-model = create_model("PPTSMv2_LCNet_k400_8frames_uniform")
-output = model.predict("general_video_classification_001.mp4", batch_size=1)
+model = create_model(model_name="PP-TSMv2-LCNetV2_8frames_uniform")
+output = model.predict(input="general_video_classification_001.mp4", batch_size=1)
 for res in output:
-    res.print(json_format=False)
-    res.save_to_video("./output/")
-    res.save_to_json("./output/res.json")
+    res.print()
+    res.save_to_video(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
 ```
 
+运行后,得到的结果为:
+```bash
+{'res': "{'input_path': 'general_video_classification_001.mp4', 'class_ids': array([0], dtype=int32), 'scores': array([0.91997], dtype=float32), 'label_names': ['abseiling']}"}
+```
+
+参数含义如下:
+- `input_path`:表示输入待预测视频的路径
+- `class_ids`:表示视频的分类id
+- `scores`:表示视频的分类分数
+- `label_names`:表示视频的分类标签名称
+
+可视化视频如下:
+
+
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/video_classification/general_video_classification_001.jpg">
+
+上述Python脚本中,执行了如下几个步骤:
+* `create_model`实例化视频分类模型(此处以`PP-TSMv2-LCNetV2_8frames_uniform`为例),具体说明如下:
+
+
+<table>
+<thead>
+<tr>
+<th>参数</th>
+<th>参数说明</th>
+<th>参数类型</th>
+<th>可选项</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tr>
+<td><code>model_name</code></td>
+<td>模型名称</td>
+<td><code>str</code></td>
+<td>所有PaddleX支持的模型名称</td>
+<td>无</td>
+</tr>
+<tr>
+<td><code>model_dir</code></td>
+<td>模型存储路径</td>
+<td><code>str</code></td>
+<td>无</td>
+<td>无</td>
+</tr>
+</table>
+
+* 调用视频分类模型的`predict`方法进行推理预测,`predict` 方法参数为`input`,用于输入待预测数据,支持多种输入类型,具体说明如下:
+
+<table>
+<thead>
+<tr>
+<th>参数</th>
+<th>参数说明</th>
+<th>参数类型</th>
+<th>可选项</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tr>
+<td><code>input</code></td>
+<td>待预测数据,支持多种输入类型</td>
+<td><code>Python Var</code>/<code>str</code>/<code>list</code></td>
+<td>
+<ul>
+  <li><b>Python变量</b>,如<code>str</code>表示的视频文件的本地路径</li>
+  <li><b>文件路径</b>,如视频文件的本地路径:<code>/root/data/video.mp4</code></li>
+  <li><b>URL链接</b>,如视频文件的网络URL:<a href = "https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/general_video_classification_001.mp4">示例</a></li>
+  <li><b>本地目录</b>,该目录下需包含待预测数据文件,如本地路径:<code>/root/data/</code></li>
+  <li><b>列表</b>,列表元素需为上述类型数据,如 <code>[\"/root/data/video1.mp4\", \"/root/data/video2.mp4\"]</code>,<code>[\"/root/data1\", \"/root/data2\"]</code></li>
+</ul>
+</td>
+<td>无</td>
+</tr>
+<tr>
+<td><code>batch_size</code></td>
+<td>批大小</td>
+<td><code>int</code></td>
+<td>无</td>
+<td>1</td>
+</tr>
+<tr>
+<td><code> topk</code></td>
+<td>预测结果的前 `topk` 个类别和对应的分类概率</td>
+<td><code>int</code></td>
+<td>无</td>
+<td><code>1</code></td>
+</tr>
+</table>
+
+* 对预测结果进行处理,每个样本的预测结果均为`dict`类型,且支持打印、保存为图片、保存为`json`文件的操作:
+
+<table>
+<thead>
+<tr>
+<th>方法</th>
+<th>方法说明</th>
+<th>参数</th>
+<th>参数类型</th>
+<th>参数说明</th>
+<th>默认值</th>
+</tr>
+</thead>
+<tr>
+<td rowspan = "3"><code>print()</code></td>
+<td rowspan = "3">打印结果到终端</td>
+<td><code>format_json</code></td>
+<td><code>bool</code></td>
+<td>是否对输出内容进行使用<code>json</code>缩进格式化</td>
+<td><code>True</code></td>
+</tr>
+<tr>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>json格式化设置,仅当<code>format_json</code>为<code>True</code>时有效</td>
+<td>4</td>
+</tr>
+<tr>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>json格式化设置,仅当<code>format_json</code>为<code>True</code>时有效</td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td rowspan = "3"><code>save_to_json()</code></td>
+<td rowspan = "3">将结果保存为json格式的文件</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致</td>
+<td>无</td>
+</tr>
+<tr>
+<td><code>indent</code></td>
+<td><code>int</code></td>
+<td>json格式化设置</td>
+<td>4</td>
+</tr>
+<tr>
+<td><code>ensure_ascii</code></td>
+<td><code>bool</code></td>
+<td>json格式化设置</td>
+<td><code>False</code></td>
+</tr>
+<tr>
+<td><code>save_to_video()</code></td>
+<td>将结果保存为视频格式的文件</td>
+<td><code>save_path</code></td>
+<td><code>str</code></td>
+<td>保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致</td>
+<td>无</td>
+</tr>
+</table>
+
+* 此外,也支持通过属性获取结果可视化视频和`json`结果:
+
+<table>
+<thead>
+<tr>
+<th>属性</th>
+<th>属性说明</th>
+</tr>
+</thead>
+<tr>
+<td rowspan = "1"><code>json</code></td>
+<td rowspan = "1">获取预测的<code>json</code>格式的结果</td>
+</tr>
+<tr>
+<td rowspan = "1"><code>video</code></td>
+<td rowspan = "1">获取格式为<code>dict</code>的可视化视频和视频帧率</td>
+</tr>
+
+</table>
+
 关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。
 
 ## 四、二次开发
@@ -79,7 +253,7 @@ tar -xf ./dataset/k400_examples.tar -C ./dataset/
 一行命令即可完成数据校验:
 
 ```bash
-python main.py -c paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+python main.py -c paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/k400_examples
 ```
@@ -165,13 +339,13 @@ CheckDataset:
   ......
 </code></pre>
 <p>随后执行命令:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/k400_examples
 </code></pre>
 <p>数据划分执行之后,原有标注文件会被在原路径下重命名为 <code>xxx.bak</code>。</p>
 <p>以上参数同样支持通过追加命令行参数的方式进行设置:</p>
-<pre><code class="language-bash">python main.py -c paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+<pre><code class="language-bash">python main.py -c paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=check_dataset \
     -o Global.dataset_dir=./dataset/k400_examples \
     -o CheckDataset.split.enable=True \
@@ -180,16 +354,16 @@ CheckDataset:
 </code></pre></details>
 
 ### 4.2 模型训练
-一条命令即可完成模型的训练,以此处视频分类模型 PPTSMv2_LCNet_k400_8frames_uniform 的训练为例:
+一条命令即可完成模型的训练,以此处视频分类模型 PP-TSMv2-LCNetV2_8frames_uniform 的训练为例:
 
 ```
-python main.py -c paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml  \
+python main.py -c paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml  \
     -o Global.mode=train \
     -o Global.dataset_dir=./dataset/k400_examples
 ```
 需要如下几步:
 
-* 指定模型的`.yaml` 配置文件路径(此处为`PPTSMv2_LCNet_k400_8frames_uniform.yaml`,训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md))
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-TSMv2-LCNetV2_8frames_uniform.yaml`,训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md))
 * 指定模式为模型训练:`-o Global.mode=train`
 * 指定训练数据集路径:`-o Global.dataset_dir`
 其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Train`下的字段来进行设置,也可以通过在命令行中追加参数来进行调整。如指定前 2 卡 gpu 训练:`-o Global.device=gpu:0,1`;设置训练轮次数为 10:`-o Train.epochs_iters=10`。更多可修改的参数及其详细解释,可以查阅模型对应任务模块的配置文件说明[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
@@ -214,13 +388,13 @@ python main.py -c paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k40
 在完成模型训练后,可以对指定的模型权重文件在验证集上进行评估,验证模型精度。使用 PaddleX 进行模型评估,一条命令即可完成模型的评估:
 
 ```bash
-python main.py -c  paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml  \
+python main.py -c  paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml  \
     -o Global.mode=evaluate \
     -o Global.dataset_dir=./dataset/k400_examples
 ```
 与模型训练类似,需要如下几步:
 
-* 指定模型的`.yaml` 配置文件路径(此处为`PPTSMv2_LCNet_k400_8frames_uniform.yaml`)
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-TSMv2-LCNetV2_8frames_uniform.yaml`)
 * 指定模式为模型评估:`-o Global.mode=evaluate`
 * 指定验证数据集路径:`-o Global.dataset_dir`
 其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Evaluate`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
@@ -238,14 +412,14 @@ python main.py -c  paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k4
 通过命令行的方式进行推理预测,只需如下一条命令。运行以下代码前,请您下载[示例视频](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/general_video_classification_001.mp4)到本地。
 
 ```bash
-python main.py -c paddlex/configs/modules/video_classification/PPTSMv2_LCNet_k400_8frames_uniform.yaml \
+python main.py -c paddlex/configs/modules/video_classification/PP-TSMv2-LCNetV2_8frames_uniform.yaml \
     -o Global.mode=predict \
     -o Predict.model_dir="./output/best_model/inference" \
     -o Predict.input="general_video_classification_001.mp4"
 ```
 与模型训练和评估类似,需要如下几步:
 
-* 指定模型的`.yaml` 配置文件路径(此处为`PPTSMv2_LCNet_k400_8frames_uniform.yaml`)
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-TSMv2-LCNetV2_8frames_uniform.yaml`)
 * 指定模式为模型推理预测:`-o Global.mode=predict`
 * 指定模型权重路径:`-o Predict.model_dir="./output/best_model/inference"`
 * 指定输入数据路径:`-o Predict.input="..."`

+ 11 - 9
paddlex/inference/pipelines_new/formula_recognition/pipeline.py

@@ -178,10 +178,10 @@ class FormulaRecognitionPipeline(BasePipeline):
     def predict(
         self,
         input: str | list[str] | np.ndarray | list[np.ndarray],
-        use_layout_detection: bool = True,
-        use_doc_orientation_classify: bool = False,
-        use_doc_unwarping: bool = False,
-        layout_det_res: DetResult = None,
+        use_layout_detection: Optional[bool] = None,
+        use_doc_orientation_classify: Optional[bool] = None,
+        use_doc_unwarping: Optional[bool] = None,
+        layout_det_res: Optional[DetResult] = None,
         **kwargs,
     ) -> FormulaRecognitionResult:
         """
@@ -189,10 +189,10 @@ class FormulaRecognitionPipeline(BasePipeline):
 
         Args:
             input (str | list[str] | np.ndarray | list[np.ndarray]): The input image(s) of pdf(s) to be processed.
-            use_layout_detection (bool): Whether to use layout detection.
-            use_doc_orientation_classify (bool): Whether to use document orientation classification.
-            use_doc_unwarping (bool): Whether to use document unwarping.
-            layout_det_res (DetResult): The layout detection result.
+            use_layout_detection (Optional[bool]): Whether to use layout detection.
+            use_doc_orientation_classify (Optional[bool]): Whether to use document orientation classification.
+            use_doc_unwarping (Optional[bool]): Whether to use document unwarping.
+            layout_det_res (Optional[DetResult]): The layout detection result.
                 It will be used if it is not None and use_layout_detection is False.
             **kwargs: Additional keyword arguments.
 
@@ -248,7 +248,9 @@ class FormulaRecognitionPipeline(BasePipeline):
                     layout_det_res = next(self.layout_det_model(doc_preprocessor_image))
                 for box_info in layout_det_res["boxes"]:
                     if box_info["label"].lower() in ["formula"]:
-                        crop_img_info = self._crop_by_boxes(image_array, [box_info])
+                        crop_img_info = self._crop_by_boxes(
+                            doc_preprocessor_image, [box_info]
+                        )
                         crop_img_info = crop_img_info[0]
                         single_formula_rec_res = (
                             self.predict_single_formula_recognition_res(

+ 20 - 11
paddlex/inference/pipelines_new/formula_recognition/result.py

@@ -53,15 +53,24 @@ class FormulaRecognitionResult(BaseCVResult):
             Dict[str, Image.Image]: An image with detection boxes, texts, and scores blended on it.
         """
         image = Image.fromarray(self["doc_preprocessor_res"]["output_img"])
+        res_img_dict = {}
+        model_settings = self["model_settings"]
+        if model_settings["use_doc_preprocessor"]:
+            res_img_dict.update(**self["doc_preprocessor_res"].img)
+
+        layout_det_res = self["layout_det_res"]
+        if len(layout_det_res) > 0:
+            res_img_dict["layout_det_res"] = layout_det_res.img["res"]
         try:
             env_valid()
         except subprocess.CalledProcessError as e:
             logging.warning(
                 "Please refer to 2.3 Formula Recognition Pipeline Visualization in Formula Recognition Pipeline Tutorial to install the LaTeX rendering engine at first."
             )
-            return {f"formula_res_img": image}
+            res_img_dict["formula_res_img"] = image
+            return res_img_dict
 
-        if len(self["layout_det_res"]) <= 0:
+        if len(layout_det_res) <= 0:
             image = np.array(image.convert("RGB"))
             rec_formula = self["formula_res_list"][0]["rec_formula"]
             xywh = crop_white_area(image)
@@ -92,10 +101,12 @@ class FormulaRecognitionResult(BaseCVResult):
                 )
                 new_image.paste(image, (0, 0))
                 new_image.paste(img_formula, (image.width + 10, 0))
-                return {f"formula_res_img": new_image}
+                res_img_dict["formula_res_img"] = new_image
+                return res_img_dict
             except subprocess.CalledProcessError as e:
                 logging.warning("Syntax error detected in formula, rendering failed.")
-                return {f"formula_res_img": image}
+                res_img_dict["formula_res_img"] = image
+                return res_img_dict
 
         h, w = image.height, image.width
         img_left = image.copy()
@@ -137,11 +148,7 @@ class FormulaRecognitionResult(BaseCVResult):
         img_show = Image.new("RGB", (int(w * 2), h), (255, 255, 255))
         img_show.paste(img_left, (0, 0, w, h))
         img_show.paste(Image.fromarray(img_right), (w, 0, w * 2, h))
-
-        model_settings = self["model_settings"]
-        res_img_dict = {f"formula_res_img": img_show}
-        if model_settings["use_doc_preprocessor"]:
-            res_img_dict.update(**self["doc_preprocessor_res"].img)
+        res_img_dict["formula_res_img"] = img_show
         return res_img_dict
 
     def _to_str(self, *args, **kwargs) -> Dict[str, str]:
@@ -159,7 +166,8 @@ class FormulaRecognitionResult(BaseCVResult):
         data["model_settings"] = self["model_settings"]
         if self["model_settings"]["use_doc_preprocessor"]:
             data["doc_preprocessor_res"] = self["doc_preprocessor_res"].str["res"]
-
+        if len(self["layout_det_res"]) > 0:
+            data["layout_det_res"] = self["layout_det_res"].str["res"]
         data["formula_res_list"] = []
         for tno in range(len(self["formula_res_list"])):
             rec_formula_dict = {
@@ -190,7 +198,8 @@ class FormulaRecognitionResult(BaseCVResult):
         data["model_settings"] = self["model_settings"]
         if self["model_settings"]["use_doc_preprocessor"]:
             data["doc_preprocessor_res"] = self["doc_preprocessor_res"].str["res"]
-
+        if len(self["layout_det_res"]) > 0:
+            data["layout_det_res"] = self["layout_det_res"].str["res"]
         data["formula_res_list"] = []
         for tno in range(len(self["formula_res_list"])):
             rec_formula_dict = {

+ 1 - 0
requirements.txt

@@ -3,6 +3,7 @@ imagesize
 colorlog
 PyYAML
 filelock
+ftfy
 ruamel.yaml
 chardet
 numpy==1.24.4