1 éve · 1653e892d5
--- a/docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md
+++ b/docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md
@@ -8,20 +8,20 @@
 
				 
			
 
				 ![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/formula_recognition/01.jpg)
			
 
				 
			
 
				-**通用****公式识别****产线中包含版面区域分析模块和公式识别模块**。
			
 
				+**通用****公式识别****产线中包含版面区域检测模块和公式识别模块**。
			
 
				 
			
 
				 **如您更考虑模型精度，请选择精度较高的模型，如您更考虑模型推理速度，请选择推理速度较快的模型，如您更考虑模型存储大小，请选择存储大小较小的模型**。
			
 
				 
			
 
				 <details>
			
 
				    <summary> 👉模型列表详情</summary>
			
 
				 
			
 
				-**版面区域分析模块模型：**
			
 
				+**版面区域检测模块模型：**
			
 
				 
			
 
				 |模型名称|mAP（%）|GPU推理耗时（ms）|CPU推理耗时|模型存储大小（M)|
			
 
				 |-|-|-|-|-|
			
 
				 |RT-DETR-H_layout_17cls|92.6|115.126|3827.25|470.2M|
			
 
				 
			
 
				-**注：以上精度指标的评估集是 PaddleX 自建的版面区域分析数据集，包含 1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器，精度类型为 FP32， CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz，线程数为8，精度类型为 FP32。**
			
 
				+**注：以上精度指标的评估集是 PaddleX 自建的版面区域检测数据集，包含 1w 张图片。以上所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器，精度类型为 FP32， CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz，线程数为8，精度类型为 FP32。**
			
 
				 
			
 
				 **公式识别模块模型：**
			
 
				 |模型名称|BLEU score|normed edit distance|ExpRate （%）|GPU推理耗时（ms）|CPU推理耗时（ms）|模型存储大小|
			
@@ -99,11 +99,12 @@ paddlex --pipeline ./formula_recognition.yaml --input formula_recognition.jpg
 
				        [1041.6333, 1530.7142],
			
 
				        [ 524.9582, 1530.7142]], dtype=float32)], 'rec_formula': ['F({\bf x})=C(F_{1}(x_{1}),\cdot\cdot\cdot,F_{N}(x_{N})).\qquad\qquad\qquad(1)', 'p(\mathbf{x})=c(\mathbf{u})\prod_{i}p(x_{i}).\qquad\qquad\qquad\qquad\qquad\quad\quad~~\quad~~~~~~~~~~~~~~~(2)', 'H_{c}({\bf x})=-\int_{{\bf{u}}}c({\bf{u}})\log c({\bf{u}})d{\bf{u}}.~~~~~~~~~~~~~~~~~~~~~(3)', 'I({\bf x})=-H_{c}({\bf x}).\qquad\qquad\qquad\qquad(4)', 'H({\bf x})=\sum_{i}H(x_{i})+H_{c}({\bf x}).\eqno\qquad\qquad\qquad(5)']}
			
 
				 ```
			
 
				-其中，dt_polys为检测到的公式区域坐标，，rec_formula为检测到的公式。
			
 
				+其中，dt_polys为检测到的公式区域坐标， rec_formula为检测到的公式。
			
 
				 </details>
			
 
				 
			
 
				 可视化结果如下：
			
 
				-![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/formula_recognition/03.png)
			
 
				+
			
 
				+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/formula_recognition/02.jpg)
			
 
				 
			
 
				 可视化图片默认保存在 `output` 目录下，您也可以通过 `--save_path` 进行自定义。此外，您可以通过网站 [https://www.lddgo.net/math/latex-to-image](https://www.lddgo.net/math/latex-to-image) 对识别出来的LaTeX代码进行可视化。
			
 
				 
			
@@ -690,9 +691,9 @@ print_r($result["texts"]);
 
				 如果通用公式识别产线提供的默认模型权重在您的场景中，精度或速度不满意，您可以尝试利用**您自己拥有的特定领域或应用场景的数据**对现有模型进行进一步的**微调**，以提升通用公式识别产线的在您的场景中的识别效果。
			
 
				 
			
 
				 ### 4.1 模型微调
			
 
				-由于通用通用公式识别产线包含两个模块（版面区域分析模块和公式识别），模型产线的效果不及预期可能来自于其中任何一个模块。
			
 
				+由于通用通用公式识别产线包含两个模块（版面区域检测模块和公式识别），模型产线的效果不及预期可能来自于其中任何一个模块。
			
 
				 
			
 
				-您可以对识别效果差的图片进行分析，如果在分析过程中发现有较多的公式未被检测出来（即公式漏检现象），那么可能是版面区域分析模型存在不足，您需要参考[版面区域检测模块开发教程](../../../module_usage/tutorials/ocr_modules/layout_detection.md)中的[二次开发](../../../module_usage/tutorials/ocr_modules/layout_detection.md#四二次开发)章节，使用您的私有数据集对版面区域分析模型进行微调；如果在已检测到的公式中出现较多的识别错误（即识别出的公式内容与实际公式内容不符），这表明公式识别模型需要进一步改进，您需要参考[公式识别模块开发教程](../../../module_usage/tutorials/ocr_modules/formula_recognition.md)中的中的[二次开发](../../../module_usage/tutorials/ocr_modules/formula_recognition.md#四二次开发)章节,对公式识别模型进行微调。
			
 
				+您可以对识别效果差的图片进行分析，如果在分析过程中发现有较多的公式未被检测出来（即公式漏检现象），那么可能是版面区域检测模型存在不足，您需要参考[版面区域检测模块开发教程](../../../module_usage/tutorials/ocr_modules/layout_detection.md)中的[二次开发](../../../module_usage/tutorials/ocr_modules/layout_detection.md#四二次开发)章节，使用您的私有数据集对版面区域检测模型进行微调；如果在已检测到的公式中出现较多的识别错误（即识别出的公式内容与实际公式内容不符），这表明公式识别模型需要进一步改进，您需要参考[公式识别模块开发教程](../../../module_usage/tutorials/ocr_modules/formula_recognition.md)中的中的[二次开发](../../../module_usage/tutorials/ocr_modules/formula_recognition.md#四二次开发)章节,对公式识别模型进行微调。
			
 
				 
			
 
				 ### 4.2 模型应用
			
 
				 当您使用私有数据集完成微调训练后，可获得本地模型权重文件。
			
--- a/docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition_en.md
+++ b/docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition_en.md
@@ -8,20 +8,20 @@ Formula recognition is a technology that automatically identifies and extracts L
 
				 
			
 
				 ![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/formula_recognition/01.jpg)
			
 
				 
			
 
				-**The General Formula Recognition Pipeline comprises a layout analysis module and a formula recognition module.**
			
 
				+**The General Formula Recognition Pipeline comprises a layout detection module and a formula recognition module.**
			
 
				 
			
 
				 **If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, select a model with faster inference. If you prioritize model size, choose a model with a smaller storage footprint.**
			
 
				 
			
 
				 <details>
			
 
				    <summary> 👉Model List Details</summary>
			
 
				 
			
 
				-**Layout Analysis Module Models**:
			
 
				+**Layout Detection Module Models**:
			
 
				 
			
 
				 | Model Name | mAP (%) | GPU Inference Time (ms) | CPU Inference Time | Model Size (M) |
			
 
				 |-|-|-|-|-|
			
 
				 | RT-DETR-H_layout_17cls | 92.6 | 115.126 | 3827.25 | 470.2M |
			
 
				 
			
 
				-**Note: The above accuracy metrics are evaluated on PaddleX's self-built layout analysis dataset, containing 10,000 images. All GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
			
 
				+**Note: The above accuracy metrics are evaluated on PaddleX's self-built layout detection dataset, containing 10,000 images. All GPU inference times are based on an NVIDIA Tesla T4 machine with FP32 precision. CPU inference speeds are based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.**
			
 
				 
			
 
				 **Formula Recognition Module Models**:
			
 
				 
			
@@ -103,7 +103,7 @@ Where `dt_polys` represents the coordinates of the detected formula area, and `r
 
				 </details>
			
 
				 
			
 
				 The visualization result is as follows:
			
 
				-![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/formula_recognition/03.png)
			
 
				+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/formula_recognition/02.jpg)
			
 
				 
			
 
				 The visualization image is saved in the `output` directory by default, and you can also customize it through `--save_path`. Additionally, you can visualize the recognized LaTeX code through the website [https://www.lddgo.net/math/latex-to-image](https://www.lddgo.net/math/latex-to-image).
			
 
				 
			
@@ -221,7 +221,7 @@ Operations provided by the service:
 
				 
			
 
				         | Name | Type | Description | Required |
			
 
				         |------|------|-------------|----------|
			
 
				-        |`maxLongSide`|`integer`|During inference, if the length of the longer side of the input image for the layout analysis model is greater than `maxLongSide`, the image will be scaled so that the length of the longer side equals `maxLongSide`.|No|
			
 
				+        |`maxLongSide`|`integer`|During inference, if the length of the longer side of the input image for the layout detection model is greater than `maxLongSide`, the image will be scaled so that the length of the longer side equals `maxLongSide`.|No|
			
 
				 
			
 
				     - When the request is processed successfully, the `result` in the response body has the following properties:
			
 
				 
			
@@ -668,11 +668,11 @@ You can choose the appropriate deployment method based on your needs to proceed
 
				 
			
 
				 ## 4. Customization and Fine-tuning
			
 
				 If the default model weights provided by the general formula recognition pipeline do not meet your requirements for accuracy or speed in your specific scenario, you can try to further fine-tune the existing models using **your own domain-specific or application-specific data** to improve the recognition performance of the general formula recognition pipeline in your scenario.
			
 
				- 
			
 
				+
			
 
				 ### 4.1 Model Fine-tuning
			
 
				-Since the general formula recognition pipeline consists of two modules (layout analysis and formula recognition), unsatisfactory performance may stem from either module.
			
 
				+Since the general formula recognition pipeline consists of two modules (layout detection and formula recognition), unsatisfactory performance may stem from either module.
			
 
				 
			
 
				-You can analyze images with poor recognition results. If you find that many formula are undetected (i.e., formula miss detection), it may indicate that the layout analysis model needs improvement. You should refer to the [Customization](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md#iv-custom-development) section in the [Layout Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md) and use your private dataset to fine-tune the layout analysis model. If many recognition errors occur in detected formula (i.e., the recognized formula content does not match the actual formula content), it suggests that the formula recognition model requires further refinement. You should refer to the [Customization](../../../module_usage/tutorials/ocr_modules/formula_recognition_en.md#iv-custom-development) section in the [Formula Recognition Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/formula_recognition_en.md) and fine-tune the formula recognition model.
			
 
				+You can analyze images with poor recognition results. If you find that many formula are undetected (i.e., formula miss detection), it may indicate that the layout detection model needs improvement. You should refer to the [Customization](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md#iv-custom-development) section in the [Layout Detection Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/layout_detection_en.md) and use your private dataset to fine-tune the layout detection model. If many recognition errors occur in detected formula (i.e., the recognized formula content does not match the actual formula content), it suggests that the formula recognition model requires further refinement. You should refer to the [Customization](../../../module_usage/tutorials/ocr_modules/formula_recognition_en.md#iv-custom-development) section in the [Formula Recognition Module Development Tutorial](../../../module_usage/tutorials/ocr_modules/formula_recognition_en.md) and fine-tune the formula recognition model.
			
 
				 
			
 
				 ### 4.2 Model Application
			
 
				 After fine-tuning with your private dataset, you will obtain local model weights files.
			
@@ -682,8 +682,8 @@ If you need to use the fine-tuned model weights, simply modify the pipeline conf
 
				 ```bash
			
 
				 ......
			
 
				 Pipeline:
			
 
				-  layout_model: RT-DETR-H_layout_17cls #可修改为微调后版面区域检测模型的本地路径
			
 
				-  formula_rec_model: LaTeX_OCR_rec #可修改为微调后公式识别模型的本地路径
			
 
				+  layout_model: RT-DETR-H_layout_17cls # Can be replaced with the local path of the fine-tuned layout detection model
			
 
				+  formula_rec_model: LaTeX_OCR_rec # Can be replaced with the local path of the fine-tuned formula recognition model
			
 
				   formula_rec_batch_size: 5
			
 
				   device: "gpu:0"
			
 
				 ......
			
@@ -706,4 +706,4 @@ Now, if you want to switch the hardware to Ascend NPU, you only need to modify t
 
				 paddlex --pipeline formula_recognition --input general_formula_recognition.png --device npu:0
			
 
				 ```
			
 
				 
			
 
				-If you want to use the general formula recognition pipeline on more types of hardware, please refer to the [PaddleX Multi-Hardware Usage Guide](../../../other_devices_support/installation_other_devices_en.md).
			
 
				+If you want to use the general formula recognition pipeline on more types of hardware, please refer to the [PaddleX Multi-Hardware Usage Guide](../../../other_devices_support/installation_other_devices_en.md).
			
--- a/paddlex/inference/pipelines/formula_recognition.py
+++ b/paddlex/inference/pipelines/formula_recognition.py
@@ -49,7 +49,7 @@ class FormulaRecognitionPipeline(BasePipeline):
 
				 
			
 
				     def predict(self, x, **kwargs):
			
 
				         device = kwargs.get("device", None)
			
 
				-        for layout_pred in self.layout_predictor(x):
			
 
				+        for layout_pred in self.layout_predictor(x, device=device):
			
 
				             single_img_res = {
			
 
				                 "input_path": "",
			
 
				                 "layout_result": {},
			
@@ -62,7 +62,9 @@ class FormulaRecognitionPipeline(BasePipeline):
 
				             single_img_res["dt_polys"] = []
			
 
				             single_img_res["rec_formula"] = []
			
 
				             all_subs_of_formula_img = []
			
 
				-            layout_pred["boxes"] = sorted(layout_pred["boxes"], key=lambda x : self.sorted_formula_box(x))
			
 
				+            layout_pred["boxes"] = sorted(
			
 
				+                layout_pred["boxes"], key=lambda x: self.sorted_formula_box(x)
			
 
				+            )
			
 
				             if len(layout_pred["boxes"]) > 0:
			
 
				                 subs_of_img = list(self._crop_by_boxes(layout_pred))
			
 
				                 # get cropped images with label "formula"
			
@@ -70,19 +72,21 @@ class FormulaRecognitionPipeline(BasePipeline):
 
				                     if sub["label"].lower() == "formula":
			
 
				                         boxes = sub["box"]
			
 
				                         x1, y1, x2, y2 = list(boxes)
			
 
				-                        poly = np.array([[x1, y1],[ x2, y1], [x2, y2], [x1, y2]])
			
 
				+                        poly = np.array([[x1, y1], [x2, y1], [x2, y2], [x1, y2]])
			
 
				                         all_subs_of_formula_img.append(sub["img"])
			
 
				                         single_img_res["dt_polys"].append(poly)
			
 
				-                if len(all_subs_of_formula_img)>0:
			
 
				+                if len(all_subs_of_formula_img) > 0:
			
 
				                     for formula_res in self.formula_predictor(
			
 
				                         all_subs_of_formula_img,
			
 
				                         batch_size=kwargs.get("formula_rec_batch_size", 1),
			
 
				                         device=device,
			
 
				                     ):
			
 
				-                        single_img_res["rec_formula"].append(str(formula_res["rec_text"]))
			
 
				+                        single_img_res["rec_formula"].append(
			
 
				+                            str(formula_res["rec_text"])
			
 
				+                        )
			
 
				             yield FormulaRecResult(single_img_res)
			
 
				 
			
 
				     def sorted_formula_box(self, x):
			
 
				         coordinate = x["coordinate"]
			
 
				         x1, y1, x2, y2 = list(coordinate)
			
 
				-        return (y1+y2)/2
			
 
				+        return (y1 + y2) / 2
			
--- a/paddlex/inference/results/formula_rec.py
+++ b/paddlex/inference/results/formula_rec.py
@@ -27,9 +27,8 @@ class FormulaRecResult(CVResult):
 
				     _HARD_FLAG = False
			
 
				 
			
 
				     def _to_str(self):
			
 
				-        rec_formula_str = ", ".join([str(formula) for formula in self['rec_formula']])
			
 
				-        return str(self).replace("\\\\","\\")
			
 
				-
			
 
				+        rec_formula_str = ", ".join([str(formula) for formula in self["rec_formula"]])
			
 
				+        return str(self).replace("\\\\", "\\")
			
 
				 
			
 
				     def get_minarea_rect(self, points):
			
 
				         bounding_box = cv2.minAreaRect(points)
			
@@ -54,15 +53,11 @@ class FormulaRecResult(CVResult):
 
				         ).astype(np.int32)
			
 
				 
			
 
				         return box
			
 
				-    
			
 
				-   
			
 
				+
			
 
				     def _to_img(
			
 
				         self,
			
 
				     ):
			
 
				-        """draw ocr result"""
			
 
				-        # TODO(gaotingquan): mv to postprocess
			
 
				-        drop_score = 0.5
			
 
				-
			
 
				+        """draw formula result"""
			
 
				         boxes = self["dt_polys"]
			
 
				         formula = self["rec_formula"]
			
 
				         image = self._img_reader.read(self["input_path"])
			
@@ -99,18 +94,3 @@ class FormulaRecResult(CVResult):
 
				         img_show = Image.new("RGB", (w, h), (255, 255, 255))
			
 
				         img_show.paste(img_left, (0, 0, w, h))
			
 
				         return img_show
			
 
				-
			
 
				-
			
 
				-def create_font(txt, sz, font_path):
			
 
				-    """create font"""
			
 
				-    font_size = int(sz[1] * 0.8)
			
 
				-    font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
			
 
				-    if int(PIL.__version__.split(".")[0]) < 10:
			
 
				-        length = font.getsize(txt)[0]
			
 
				-    else:
			
 
				-        length = font.getlength(txt)
			
 
				-
			
 
				-    if length > sz[0]:
			
 
				-        font_size = int(font_size * sz[0] / length)
			
 
				-        font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
			
 
				-    return font
			
--- a/paddlex/pipelines/formula_recognition.yaml
+++ b/paddlex/pipelines/formula_recognition.yaml
@@ -1,6 +1,6 @@
 
				 Global:
			
 
				   pipeline_name: formula_recognition
			
 
				-  input: https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png
			
 
				+  input: https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/general_formula_recognition.png
			
 
				   
			
 
				 Pipeline:
			
 
				   layout_model: RT-DETR-H_layout_17cls