1 year ago · b7978671fc
--- a/docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md
+++ b/docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md
@@ -20,13 +20,14 @@
 
				 
			
 
				 **版面区域分析模块模型：**
			
 
				 
			
 
				-|模型名称|mAP（%）|GPU推理耗时（ms）|CPU推理耗时|模型存储大小（M）|
			
 
				-|-|-|-|-|-|
			
 
				-|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
			
 
				-|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
			
 
				-|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
			
 
				+|模型|mAP(0.5)（%）|GPU推理耗时（ms）|CPU推理耗时 (ms)|模型存储大小（M）|介绍|
			
 
				+|-|-|-|-|-|-|
			
 
				+|PicoDet-L_layout_3cls|89.3|15.7|159.8|22.6|基于PicoDet-L在中英文论文、杂志和研报等场景上自建数据集训练的高效率版面区域定位模型，包含3个类别：表格，图像和印章|
			
 
				+|RT-DETR-H_layout_3cls|95.9|114.6|3832.6|470.1|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型，包含3个类别：表格，图像和印章|
			
 
				+|RT-DETR-H_layout_17cls|92.6|115.1|3827.2|470.2|基于RT-DETR-H在中英文论文、杂志和研报等场景上自建数据集训练的高精度版面区域定位模型，包含17个版面常见类别，分别是：段落标题、图片、文本、数字、摘要、内容、图表标题、公式、表格、表格标题、参考文献、文档标题、脚注、页眉、算法、页脚、印章|
			
 
				+
			
 
				 
			
 
				-**注：以上精度指标的评估集是 PaddleX 自建的版面区域分析数据集，包含 1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器，精度类型为 FP32， CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz，线程数为8，精度类型为 FP32。**
			
 
				+**注：以上精度指标的评估集是 PaddleOCR 自建的版面区域分析数据集，包含中英文论文、杂志和研报等常见的 1w 张文档类型图片。GPU 推理耗时基于 NVIDIA Tesla T4 机器，精度类型为 FP32， CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz，线程数为 8，精度类型为 FP32。**
			
 
				 
			
 
				 **印章文本检测模块模型：**
			
 
				 
			
@@ -39,26 +40,18 @@
 
				 
			
 
				 **文本识别模块模型：**
			
 
				 
			
 
				-|模型名称|识别Avg Accuracy(%)|GPU推理耗时（ms）|CPU推理耗时|模型存储大小（M）|
			
 
				-|-|-|-|-|-|
			
 
				-|PP-OCRv4_mobile_rec |78.20|7.95018|46.7868|10.6 M|
			
 
				-|PP-OCRv4_server_rec |79.20|7.19439|140.179|71.2 M|
			
 
				+|模型名称|识别Avg Accuracy(%)|GPU推理耗时（ms）|CPU推理耗时|模型存储大小（M）|介绍|
			
 
				+|-|-|-|-|-|-|
			
 
				+|PP-OCRv4_mobile_rec |78.20|7.95018|46.7868|10.6 M|PP-OCRv4的移动端文本识别模型，效率更高，适合在端侧部署|
			
 
				+|PP-OCRv4_server_rec |79.20|7.19439|140.179|71.2 M|PP-OCRv4的服务端文本识别模型，精度更高，适合在较好的服务器上部署|
			
 
				 
			
 
				 **注：以上精度指标的评估集是 PaddleOCR 自建的中文数据集 ，覆盖街景、网图、文档、手写多个场景，其中文本识别包含 1.1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器，精度类型为 FP32， CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz，线程数为8，精度类型为 FP32。**
			
 
				 
			
 
				 </details>
			
 
				 
			
 
				 ## 2. 快速开始
			
 
				-PaddleX 所提供的预训练的模型产线均可以快速体验效果，你可以在线体验印章文本识别产线的效果，也可以在本地使用命令行或 Python 体验印章文本识别产线的效果。
			
 
				-
			
 
				-### 2.1 在线体验
			
 
				-您可以[在线体验](https://aistudio.baidu.com/community/app/182491/webUI)文档场景信息抽取v3产线中的印章文本识别的效果，用官方提供的 Demo 图片进行识别，例如：
			
 
				-
			
 
				-![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/seal_recognition/02.png)
			
 
				-
			
 
				-如果您对产线运行的效果满意，可以直接对产线进行集成部署，如果不满意，您也可以利用私有数据**对产线中的模型进行在线微调**。
			
 
				+PaddleX 所提供的预训练的模型产线均可以快速体验效果，你可以在本地使用命令行或 Python 体验印章文本识别产线的效果。
			
 
				 
			
 
				-### 2.2 本地体验
			
 
				 在本地使用印章文本识别产线前，请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
			
 
				 
			
 
				 ### 2.1 命令行方式体验
			
--- a/docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition_en.md
+++ b/docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition_en.md
@@ -17,13 +17,13 @@ The **Seal Recognition** pipeline includes a layout area analysis module, a seal
 
				 
			
 
				 **Layout Analysis Module Models:**
			
 
				 
			
 
				-|Model Name|mAP (%)|GPU Inference Time (ms)|CPU Inference Time|Model Size (M)|
			
 
				-|-|-|-|-|-|
			
 
				-|PicoDet-L_layout_3cls|89.3|15.7425|159.771|22.6 M|
			
 
				-|RT-DETR-H_layout_3cls|95.9|114.644|3832.62|470.1M|
			
 
				-|RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
			
 
				+| Model | mAP(0.5) (%) | GPU Inference Time (ms) | CPU Inference Time (ms) | Model Size (M) | Description |
			
 
				+|-|-|-|-|-|-|
			
 
				+| PicoDet-L_layout_3cls | 89.3 | 15.7 | 159.8 | 22.6 | An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
			
 
				+| RT-DETR-H_layout_3cls | 95.9 | 114.6 | 3832.6 | 470.1 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals. |
			
 
				+| RT-DETR-H_layout_17cls | 92.6 | 115.1 | 3827.2 | 470.2 | A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals. |
			
 
				 
			
 
				-**Note: The evaluation set for the above accuracy indicators is a self-built layout area analysis dataset from PaddleX, containing 10,000 images. The GPU inference time for all models above is based on an NVIDIA Tesla T4 machine with a precision type of FP32. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads, and the precision type is also FP32.**
			
 
				+**Note: The evaluation set for the above accuracy metrics is PaddleOCR's self-built layout region analysis dataset, containing 10,000 images of common document types, including English and Chinese papers, magazines, research reports, etc. GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
			
 
				 
			
 
				 
			
 
				 **Seal Detection Module Models**:
			
@@ -50,17 +50,10 @@ The **Seal Recognition** pipeline includes a layout area analysis module, a seal
 
				 ## 2.  Quick Start
			
 
				 The pre trained model production line provided by PaddleX can quickly experience the effect. You can experience the effect of the seal recognition production line online, or use the command line or Python locally to experience the effect of the seal recognition production line.
			
 
				 
			
 
				-### 2.1 Online Experience
			
 
				-You can [experience online](https://aistudio.baidu.com/community/app/182491/webUI) the effect of seal recognition in the v3 production line for extracting document scene information, using official demo images for recognition, for example:
			
 
				-
			
 
				-! []( https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/seal_recognition/02.png )
			
 
				-
			
 
				-If you are satisfied with the performance of the production line, you can directly integrate and deploy the production line. If you are not satisfied, you can also use private data to fine tune the models in the production line online.
			
 
				 
			
 
				-### 2.2 Local Experience
			
 
				 Before using the seal recognition production line locally, please ensure that you have completed the wheel package installation of PaddleX according to the  [PaddleX Local Installation Guide](../../../installation/installation_en.md).
			
 
				 
			
 
				-### 2.3 Command line experience
			
 
				+### 2.1 Command line experience
			
 
				 One command can quickly experience the effect of seal recognition production line, use [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png), and replace ` --input ` with the local path for prediction
			
 
				 
			
 
				 ```
			
--- a/paddlex/inference/components/paddle_predictor/predictor.py
+++ b/paddlex/inference/components/paddle_predictor/predictor.py
@@ -205,12 +205,12 @@ class ImagePredictor(BasePaddlePredictor):
 
				 
			
 
				 class ImageDetPredictor(BasePaddlePredictor):
			
 
				 
			
 
				-    INPUT_KEYS = [["img", "scale_factors"], ["img", "scale_factors", "img_size"]]
			
 
				+    INPUT_KEYS = [["img", "scale_factors"], ["img", "scale_factors", "img_size"], ["img", "img_size"]]
			
 
				     OUTPUT_KEYS = [["boxes"], ["boxes", "masks"]]
			
 
				     DEAULT_INPUTS = {"img": "img", "scale_factors": "scale_factors"}
			
 
				     DEAULT_OUTPUTS = None
			
 
				 
			
 
				-    def to_batch(self, img, scale_factors, img_size=None):
			
 
				+    def to_batch(self, img, scale_factors=[[1., 1.]], img_size=None):
			
 
				         scale_factors = [scale_factor[::-1] for scale_factor in scale_factors]
			
 
				         if img_size is None:
			
 
				             return [
			
--- a/paddlex/inference/models/object_detection.py
+++ b/paddlex/inference/models/object_detection.py
@@ -53,6 +53,14 @@ class DetPredictor(BasicPredictor):
 
				                 }
			
 
				             )
			
 
				 
			
 
				+        if self.model_name == "Blazeface":
			
 
				+            predictor.set_inputs(
			
 
				+                {
			
 
				+                    "img": "img",
			
 
				+                    "img_size": "img_size",
			
 
				+                }
			
 
				+            )
			
 
				+        
			
 
				         self._add_component(
			
 
				             [
			
 
				                 predictor,
			
@@ -94,7 +102,7 @@ class DetPredictor(BasicPredictor):
 
				         if norm_type != "mean_std":
			
 
				             mean = 0
			
 
				             std = 1
			
 
				-        return Normalize(mean=mean, std=std)
			
 
				+        return Normalize(scale=scale, mean=mean, std=std)
			
 
				 
			
 
				     @register("Permute")
			
 
				     def build_to_chw(self):