1 年之前 · 845a3ff067
--- a/.gitignore
+++ b/.gitignore
@@ -48,3 +48,6 @@ debug_utils/
 
				 
			
 
				 # sphinx docs
			
 
				 _build/
			
 
				+
			
 
				+
			
 
				+output/
			
--- a/README.md
+++ b/README.md
@@ -42,6 +42,7 @@
 
				 </div>
			
 
				 
			
 
				 # Changelog
			
 
				+- 2024/11/15 0.9.3 released. Integrated [RapidTable](https://github.com/RapidAI/RapidTable) for table recognition, improving single-table parsing speed by more than 10 times, with higher accuracy and lower GPU memory usage.
			
 
				 - 2024/11/06 0.9.2 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
			
 
				 - 2024/10/31 0.9.0 released. This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:
			
 
				   - Refactored the sorting module code to use [layoutreader](https://github.com/ppaanngggg/layoutreader) for reading order sorting, ensuring high accuracy in various layouts.
			
@@ -246,7 +247,7 @@ You can modify certain configurations in this file to enable or disable features
 
				         "enable": true  // The formula recognition feature is enabled by default. If you need to disable it, please change the value here to "false".
			
 
				     },
			
 
				     "table-config": {
			
 
				-        "model": "tablemaster",  // When using structEqTable, please change to "struct_eqtable".
			
 
				+        "model": "rapid_table",  // When using structEqTable, please change to "struct_eqtable".
			
 
				         "enable": false, // The table recognition feature is disabled by default. If you need to enable it, please change the value here to "true".
			
 
				         "max_time": 400
			
 
				     }
			
@@ -261,7 +262,7 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
 
				 - [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
			
 
				 - Quick Deployment with Docker
			
 
				 > [!IMPORTANT]
			
 
				-> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
			
 
				+> Docker requires a GPU with at least 8GB of VRAM, and all acceleration features are enabled by default.
			
 
				 >
			
 
				 > Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
			
 
				 > 
			
@@ -421,7 +422,9 @@ This project currently uses PyMuPDF to achieve advanced functionality. However,
 
				 # Acknowledgments
			
 
				 
			
 
				 - [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit)
			
 
				+- [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
			
 
				 - [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)
			
 
				+- [RapidTable](https://github.com/RapidAI/RapidTable)
			
 
				 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
			
 
				 - [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
			
 
				 - [layoutreader](https://github.com/ppaanngggg/layoutreader)
			
--- a/README_ja-JP.md
+++ b/README_ja-JP.md
@@ -1,3 +1,5 @@
 
				+> [!Warning]
			
 
				+> このドキュメントはすでに古くなっています。最新版のドキュメントを参照してください：[ENGLISH](README.md)。
			
 
				 <div id="top">
			
 
				 
			
 
				 <p align="center">
			
@@ -18,9 +20,7 @@
 
				 <a href="https://trendshift.io/repositories/11174" target="_blank"><img src="https://trendshift.io/api/badge/repositories/11174" alt="opendatalab%2FMinerU | Trendshift" style="width: 200px; height: 55px;"/></a>
			
 
				 
			
 
				 
			
 
				-<div align="center" style="color: red; background-color: #ffdddd; padding: 10px; border: 1px solid red; border-radius: 5px;">
			
 
				-  <strong>NOTE：</strong> このドキュメントはすでに古くなっています。最新版のドキュメントを参照してください。
			
 
				-</div>
			
 
				+
			
 
				 
			
 
				 
			
 
				 [English](README.md) | [简体中文](README_zh-CN.md) | [日本語](README_ja-JP.md)
			
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -42,7 +42,7 @@
 
				 </div>
			
 
				 
			
 
				 # 更新记录
			
 
				-
			
 
				+- 2024/11/15 0.9.3发布，为表格识别功能接入了[RapidTable](https://github.com/RapidAI/RapidTable),单表解析速度提升10倍以上，准确率更高，显存占用更低
			
 
				 - 2024/11/06 0.9.2发布，为表格识别功能接入了[StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B)模型
			
 
				 - 2024/10/31 0.9.0发布，这是我们进行了大量代码重构的全新版本，解决了众多问题，提升了性能，降低了硬件需求，并提供了更丰富的易用性：
			
 
				   - 重构排序模块代码，使用 [layoutreader](https://github.com/ppaanngggg/layoutreader) 进行阅读顺序排序，确保在各种排版下都能实现极高准确率
			
@@ -188,13 +188,13 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
 
				         <td rowspan="2">GPU硬件支持列表</td>
			
 
				         <td colspan="2">最低要求 8G+显存</td>
			
 
				         <td colspan="2">3060ti/3070/4060<br>
			
 
				-        8G显存可开启layout、公式识别和ocr加速</td>
			
 
				+        8G显存可开启全部加速功能(表格仅限rapid_table)</td>
			
 
				         <td rowspan="2">None</td>
			
 
				     </tr>
			
 
				     <tr>
			
 
				         <td colspan="2">推荐配置 10G+显存</td>
			
 
				         <td colspan="2">3080/3080ti/3090/3090ti/4070/4070ti/4070tisuper/4080/4090<br>
			
 
				-        10G显存及以上可以同时开启layout、公式识别和ocr加速和表格识别加速<br>
			
 
				+        10G显存及以上可开启全部加速功能<br>
			
 
				         </td>
			
 
				     </tr>
			
 
				 </table>
			
@@ -251,7 +251,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 
				         "enable": true  // 公式识别功能默认是开启的，如果需要关闭请修改此处的值为"false"
			
 
				     },
			
 
				     "table-config": {
			
 
				-        "model": "tablemaster",  // 使用structEqTable请修改为"struct_eqtable"
			
 
				+        "model": "rapid_table",  // 使用structEqTable请修改为"struct_eqtable"
			
 
				         "enable": false, // 表格识别功能默认是关闭的，如果需要开启请修改此处的值为"true"
			
 
				         "max_time": 400
			
 
				     }
			
@@ -266,7 +266,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 
				 - [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
			
 
				 - 使用Docker快速部署
			
 
				 > [!IMPORTANT]
			
 
				-> Docker 需设备gpu显存大于等于16GB，默认开启所有加速功能
			
 
				+> Docker 需设备gpu显存大于等于8GB，默认开启所有加速功能
			
 
				 > 
			
 
				 > 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
			
 
				 > 
			
@@ -431,6 +431,7 @@ TODO
 
				 - [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit)
			
 
				 - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
			
 
				 - [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)
			
 
				+- [RapidTable](https://github.com/RapidAI/RapidTable)
			
 
				 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
			
 
				 - [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
			
 
				 - [layoutreader](https://github.com/ppaanngggg/layoutreader)
			
--- a/demo/magic_pdf_parse_main.py
+++ b/demo/magic_pdf_parse_main.py
@@ -19,9 +19,10 @@ def json_md_dump(
 
				         pdf_name,

			
 
				         content_list,

			
 
				         md_content,

			
 
				+        orig_model_list,

			
 
				 ):

			
 
				     # 写入模型结果到 model.json

			
 
				-    orig_model_list = copy.deepcopy(pipe.model_list)

			
 
				+

			
 
				     md_writer.write(

			
 
				         content=json.dumps(orig_model_list, ensure_ascii=False, indent=4),

			
 
				         path=f"{pdf_name}_model.json"

			
@@ -87,9 +88,12 @@ def pdf_parse_main(
 
				 

			
 
				         pdf_bytes = open(pdf_path, "rb").read()  # 读取 pdf 文件的二进制数据

			
 
				 

			
 
				+        orig_model_list = []

			
 
				+

			
 
				         if model_json_path:

			
 
				             # 读取已经被模型解析后的pdf文件的 json 原始数据，list 类型

			
 
				             model_json = json.loads(open(model_json_path, "r", encoding="utf-8").read())

			
 
				+            orig_model_list = copy.deepcopy(model_json)

			
 
				         else:

			
 
				             model_json = []

			
 
				 

			
@@ -115,8 +119,9 @@ def pdf_parse_main(
 
				         pipe.pipe_classify()

			
 
				 

			
 
				         # 如果没有传入模型数据，则使用内置模型解析

			
 
				-        if not model_json:

			
 
				+        if len(model_json) == 0:

			
 
				             pipe.pipe_analyze()  # 解析

			
 
				+            orig_model_list = copy.deepcopy(pipe.model_list)

			
 
				 

			
 
				         # 执行解析

			
 
				         pipe.pipe_parse()

			
@@ -126,7 +131,7 @@ def pdf_parse_main(
 
				         md_content = pipe.pipe_mk_markdown(image_path_parent, drop_mode="none")

			
 
				 

			
 
				         if is_json_md_dump:

			
 
				-            json_md_dump(pipe, md_writer, pdf_name, content_list, md_content)

			
 
				+            json_md_dump(pipe, md_writer, pdf_name, content_list, md_content, orig_model_list)

			
 
				 

			
 
				         if is_draw_visualization_bbox:

			
 
				             draw_visualization_bbox(pipe.pdf_mid_data['pdf_info'], pdf_bytes, output_path, pdf_name)

			
--- a/magic-pdf.template.json
+++ b/magic-pdf.template.json
@@ -15,7 +15,7 @@
 
				         "enable": true
			
 
				     },
			
 
				     "table-config": {
			
 
				-        "model": "tablemaster",
			
 
				+        "model": "rapid_table",
			
 
				         "enable": false,
			
 
				         "max_time": 400
			
 
				     },
			
--- a/magic_pdf/dict2md/ocr_mkcontent.py
+++ b/magic_pdf/dict2md/ocr_mkcontent.py
@@ -168,7 +168,7 @@ def merge_para_with_text(para_block):
 
				                         # 如果是前一行带有-连字符，那么末尾不应该加空格
			
 
				                         if __is_hyphen_at_line_end(content):
			
 
				                             para_text += content[:-1]
			
 
				-                        elif len(content) == 1 and content not in ['A', 'I', 'a', 'i']:
			
 
				+                        elif len(content) == 1 and content not in ['A', 'I', 'a', 'i'] and not content.isdigit():
			
 
				                             para_text += content
			
 
				                         else:  # 西方文本语境下 content间需要空格分隔
			
 
				                             para_text += f"{content} "
			
--- a/magic_pdf/libs/Constants.py
+++ b/magic_pdf/libs/Constants.py
@@ -50,4 +50,6 @@ class MODEL_NAME:
 
				 
			
 
				     YOLO_V8_MFD = "yolo_v8_mfd"
			
 
				 
			
 
				-    UniMerNet_v2_Small = "unimernet_small"
			
 
				+    UniMerNet_v2_Small = "unimernet_small"
			
 
				+
			
 
				+    RAPID_TABLE = "rapid_table"
			
--- a/magic_pdf/libs/config_reader.py
+++ b/magic_pdf/libs/config_reader.py
@@ -92,7 +92,7 @@ def get_table_recog_config():
 
				     table_config = config.get('table-config')
			
 
				     if table_config is None:
			
 
				         logger.warning(f"'table-config' not found in {CONFIG_FILE_NAME}, use 'False' as default")
			
 
				-        return json.loads(f'{{"model": "{MODEL_NAME.TABLE_MASTER}","enable": false, "max_time": 400}}')
			
 
				+        return json.loads(f'{{"model": "{MODEL_NAME.RAPID_TABLE}","enable": false, "max_time": 400}}')
			
 
				     else:
			
 
				         return table_config
			
 
				 
			
--- a/magic_pdf/libs/draw_bbox.py
+++ b/magic_pdf/libs/draw_bbox.py
@@ -369,10 +369,16 @@ def draw_line_sort_bbox(pdf_info, pdf_bytes, out_path, filename):
 
				             if block['type'] in [BlockType.Image, BlockType.Table]:
			
 
				                 for sub_block in block['blocks']:
			
 
				                     if sub_block['type'] in [BlockType.ImageBody, BlockType.TableBody]:
			
 
				-                        for line in sub_block['virtual_lines']:
			
 
				-                            bbox = line['bbox']
			
 
				-                            index = line['index']
			
 
				-                            page_line_list.append({'index': index, 'bbox': bbox})
			
 
				+                        if len(sub_block['virtual_lines']) > 0 and sub_block['virtual_lines'][0].get('index', None) is not None:
			
 
				+                            for line in sub_block['virtual_lines']:
			
 
				+                                bbox = line['bbox']
			
 
				+                                index = line['index']
			
 
				+                                page_line_list.append({'index': index, 'bbox': bbox})
			
 
				+                        else:
			
 
				+                            for line in sub_block['lines']:
			
 
				+                                bbox = line['bbox']
			
 
				+                                index = line['index']
			
 
				+                                page_line_list.append({'index': index, 'bbox': bbox})
			
 
				                     elif sub_block['type'] in [BlockType.ImageCaption, BlockType.TableCaption, BlockType.ImageFootnote, BlockType.TableFootnote]:
			
 
				                         for line in sub_block['lines']:
			
 
				                             bbox = line['bbox']
			
--- a/magic_pdf/model/pdf_extract_kit.py
+++ b/magic_pdf/model/pdf_extract_kit.py
@@ -1,195 +1,28 @@
 
				+import numpy as np
			
 
				+import torch
			
 
				 from loguru import logger
			
 
				 import os
			
 
				 import time
			
 
				-from pathlib import Path
			
 
				-import shutil
			
 
				-from magic_pdf.libs.Constants import *
			
 
				-from magic_pdf.libs.clean_memory import clean_memory
			
 
				-from magic_pdf.model.model_list import AtomicModel
			
 
				+import cv2
			
 
				+import yaml
			
 
				+from PIL import Image
			
 
				 
			
 
				 os.environ['NO_ALBUMENTATIONS_UPDATE'] = '1'  # 禁止albumentations检查更新
			
 
				 os.environ['YOLO_VERBOSE'] = 'False'  # disable yolo logger
			
 
				+
			
 
				 try:
			
 
				-    import cv2
			
 
				-    import yaml
			
 
				-    import argparse
			
 
				-    import numpy as np
			
 
				-    import torch
			
 
				     import torchtext
			
 
				 
			
 
				     if torchtext.__version__ >= "0.18.0":
			
 
				         torchtext.disable_torchtext_deprecation_warning()
			
 
				-    from PIL import Image
			
 
				-    from torchvision import transforms
			
 
				-    from torch.utils.data import Dataset, DataLoader
			
 
				-    from ultralytics import YOLO
			
 
				-    from unimernet.common.config import Config
			
 
				-    import unimernet.tasks as tasks
			
 
				-    from unimernet.processors import load_processor
			
 
				-    from doclayout_yolo import YOLOv10
			
 
				-
			
 
				-except ImportError as e:
			
 
				-    logger.exception(e)
			
 
				-    logger.error(
			
 
				-        'Required dependency not installed, please install by \n'
			
 
				-        '"pip install magic-pdf[full] --extra-index-url https://myhloli.github.io/wheels/"')
			
 
				-    exit(1)
			
 
				-
			
 
				-from magic_pdf.model.pek_sub_modules.layoutlmv3.model_init import Layoutlmv3_Predictor
			
 
				-from magic_pdf.model.pek_sub_modules.post_process import latex_rm_whitespace
			
 
				-from magic_pdf.model.pek_sub_modules.self_modify import ModifiedPaddleOCR
			
 
				-from magic_pdf.model.pek_sub_modules.structeqtable.StructTableModel import StructTableModel
			
 
				-from magic_pdf.model.ppTableModel import ppTableModel
			
 
				-
			
 
				-
			
 
				-def table_model_init(table_model_type, model_path, max_time, _device_='cpu'):
			
 
				-    if table_model_type == MODEL_NAME.STRUCT_EQTABLE:
			
 
				-        table_model = StructTableModel(model_path, max_time=max_time)
			
 
				-    elif table_model_type == MODEL_NAME.TABLE_MASTER:
			
 
				-        config = {
			
 
				-            "model_dir": model_path,
			
 
				-            "device": _device_
			
 
				-        }
			
 
				-        table_model = ppTableModel(config)
			
 
				-    else:
			
 
				-        logger.error("table model type not allow")
			
 
				-        exit(1)
			
 
				-    return table_model
			
 
				-
			
 
				-
			
 
				-def mfd_model_init(weight):
			
 
				-    mfd_model = YOLO(weight)
			
 
				-    return mfd_model
			
 
				-
			
 
				-
			
 
				-def mfr_model_init(weight_dir, cfg_path, _device_='cpu'):
			
 
				-    args = argparse.Namespace(cfg_path=cfg_path, options=None)
			
 
				-    cfg = Config(args)
			
 
				-    cfg.config.model.pretrained = os.path.join(weight_dir, "pytorch_model.pth")
			
 
				-    cfg.config.model.model_config.model_name = weight_dir
			
 
				-    cfg.config.model.tokenizer_config.path = weight_dir
			
 
				-    task = tasks.setup_task(cfg)
			
 
				-    model = task.build_model(cfg)
			
 
				-    model.to(_device_)
			
 
				-    model.eval()
			
 
				-    vis_processor = load_processor('formula_image_eval', cfg.config.datasets.formula_rec_eval.vis_processor.eval)
			
 
				-    mfr_transform = transforms.Compose([vis_processor, ])
			
 
				-    return [model, mfr_transform]
			
 
				-
			
 
				-
			
 
				-def layout_model_init(weight, config_file, device):
			
 
				-    model = Layoutlmv3_Predictor(weight, config_file, device)
			
 
				-    return model
			
 
				-
			
 
				-
			
 
				-def doclayout_yolo_model_init(weight):
			
 
				-    model = YOLOv10(weight)
			
 
				-    return model
			
 
				-
			
 
				-
			
 
				-def ocr_model_init(show_log: bool = False, det_db_box_thresh=0.3, lang=None, use_dilation=True, det_db_unclip_ratio=1.8):
			
 
				-    if lang is not None:
			
 
				-        model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=det_db_box_thresh, lang=lang, use_dilation=use_dilation, det_db_unclip_ratio=det_db_unclip_ratio)
			
 
				-    else:
			
 
				-        model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=det_db_box_thresh, use_dilation=use_dilation, det_db_unclip_ratio=det_db_unclip_ratio)
			
 
				-    return model
			
 
				-
			
 
				-
			
 
				-class MathDataset(Dataset):
			
 
				-    def __init__(self, image_paths, transform=None):
			
 
				-        self.image_paths = image_paths
			
 
				-        self.transform = transform
			
 
				-
			
 
				-    def __len__(self):
			
 
				-        return len(self.image_paths)
			
 
				-
			
 
				-    def __getitem__(self, idx):
			
 
				-        # if not pil image, then convert to pil image
			
 
				-        if isinstance(self.image_paths[idx], str):
			
 
				-            raw_image = Image.open(self.image_paths[idx])
			
 
				-        else:
			
 
				-            raw_image = self.image_paths[idx]
			
 
				-        if self.transform:
			
 
				-            image = self.transform(raw_image)
			
 
				-            return image
			
 
				-
			
 
				-
			
 
				-class AtomModelSingleton:
			
 
				-    _instance = None
			
 
				-    _models = {}
			
 
				-
			
 
				-    def __new__(cls, *args, **kwargs):
			
 
				-        if cls._instance is None:
			
 
				-            cls._instance = super().__new__(cls)
			
 
				-        return cls._instance
			
 
				-
			
 
				-    def get_atom_model(self, atom_model_name: str, **kwargs):
			
 
				-        lang = kwargs.get("lang", None)
			
 
				-        layout_model_name = kwargs.get("layout_model_name", None)
			
 
				-        key = (atom_model_name, layout_model_name, lang)
			
 
				-        if key not in self._models:
			
 
				-            self._models[key] = atom_model_init(model_name=atom_model_name, **kwargs)
			
 
				-        return self._models[key]
			
 
				-
			
 
				-
			
 
				-def atom_model_init(model_name: str, **kwargs):
			
 
				-
			
 
				-    if model_name == AtomicModel.Layout:
			
 
				-        if kwargs.get("layout_model_name") == MODEL_NAME.LAYOUTLMv3:
			
 
				-            atom_model = layout_model_init(
			
 
				-                kwargs.get("layout_weights"),
			
 
				-                kwargs.get("layout_config_file"),
			
 
				-                kwargs.get("device")
			
 
				-            )
			
 
				-        elif kwargs.get("layout_model_name") == MODEL_NAME.DocLayout_YOLO:
			
 
				-            atom_model = doclayout_yolo_model_init(
			
 
				-                kwargs.get("doclayout_yolo_weights"),
			
 
				-            )
			
 
				-    elif model_name == AtomicModel.MFD:
			
 
				-        atom_model = mfd_model_init(
			
 
				-            kwargs.get("mfd_weights")
			
 
				-        )
			
 
				-    elif model_name == AtomicModel.MFR:
			
 
				-        atom_model = mfr_model_init(
			
 
				-            kwargs.get("mfr_weight_dir"),
			
 
				-            kwargs.get("mfr_cfg_path"),
			
 
				-            kwargs.get("device")
			
 
				-        )
			
 
				-    elif model_name == AtomicModel.OCR:
			
 
				-        atom_model = ocr_model_init(
			
 
				-            kwargs.get("ocr_show_log"),
			
 
				-            kwargs.get("det_db_box_thresh"),
			
 
				-            kwargs.get("lang")
			
 
				-        )
			
 
				-    elif model_name == AtomicModel.Table:
			
 
				-        atom_model = table_model_init(
			
 
				-            kwargs.get("table_model_name"),
			
 
				-            kwargs.get("table_model_path"),
			
 
				-            kwargs.get("table_max_time"),
			
 
				-            kwargs.get("device")
			
 
				-        )
			
 
				-    else:
			
 
				-        logger.error("model name not allow")
			
 
				-        exit(1)
			
 
				-
			
 
				-    return atom_model
			
 
				-
			
 
				+except ImportError:
			
 
				+    pass
			
 
				 
			
 
				-#  Unified crop img logic
			
 
				-def crop_img(input_res, input_pil_img, crop_paste_x=0, crop_paste_y=0):
			
 
				-    crop_xmin, crop_ymin = int(input_res['poly'][0]), int(input_res['poly'][1])
			
 
				-    crop_xmax, crop_ymax = int(input_res['poly'][4]), int(input_res['poly'][5])
			
 
				-    # Create a white background with an additional width and height of 50
			
 
				-    crop_new_width = crop_xmax - crop_xmin + crop_paste_x * 2
			
 
				-    crop_new_height = crop_ymax - crop_ymin + crop_paste_y * 2
			
 
				-    return_image = Image.new('RGB', (crop_new_width, crop_new_height), 'white')
			
 
				-
			
 
				-    # Crop image
			
 
				-    crop_box = (crop_xmin, crop_ymin, crop_xmax, crop_ymax)
			
 
				-    cropped_img = input_pil_img.crop(crop_box)
			
 
				-    return_image.paste(cropped_img, (crop_paste_x, crop_paste_y))
			
 
				-    return_list = [crop_paste_x, crop_paste_y, crop_xmin, crop_ymin, crop_xmax, crop_ymax, crop_new_width, crop_new_height]
			
 
				-    return return_image, return_list
			
 
				+from magic_pdf.libs.Constants import *
			
 
				+from magic_pdf.model.model_list import AtomicModel
			
 
				+from magic_pdf.model.sub_modules.model_init import AtomModelSingleton
			
 
				+from magic_pdf.model.sub_modules.model_utils import get_res_list_from_layout_res, crop_img, clean_vram
			
 
				+from magic_pdf.model.sub_modules.ocr.paddleocr.ocr_utils import get_adjusted_mfdetrec_res, get_ocr_result_list
			
 
				 
			
 
				 
			
 
				 class CustomPEKModel:
			
@@ -226,7 +59,7 @@ class CustomPEKModel:
 
				         self.table_config = kwargs.get("table_config")
			
 
				         self.apply_table = self.table_config.get("enable", False)
			
 
				         self.table_max_time = self.table_config.get("max_time", TABLE_MAX_TIME_VALUE)
			
 
				-        self.table_model_name = self.table_config.get("model", MODEL_NAME.TABLE_MASTER)
			
 
				+        self.table_model_name = self.table_config.get("model", MODEL_NAME.RAPID_TABLE)
			
 
				 
			
 
				         # ocr config
			
 
				         self.apply_ocr = ocr
			
@@ -235,7 +68,8 @@ class CustomPEKModel:
 
				         logger.info(
			
 
				             "DocAnalysis init, this may take some times, layout_model: {}, apply_formula: {}, apply_ocr: {}, "
			
 
				             "apply_table: {}, table_model: {}, lang: {}".format(
			
 
				-                self.layout_model_name, self.apply_formula, self.apply_ocr, self.apply_table, self.table_model_name, self.lang
			
 
				+                self.layout_model_name, self.apply_formula, self.apply_ocr, self.apply_table, self.table_model_name,
			
 
				+                self.lang
			
 
				             )
			
 
				         )
			
 
				         # 初始化解析方案
			
@@ -248,17 +82,17 @@ class CustomPEKModel:
 
				 
			
 
				         # 初始化公式识别
			
 
				         if self.apply_formula:
			
 
				-
			
 
				             # 初始化公式检测模型
			
 
				             self.mfd_model = atom_model_manager.get_atom_model(
			
 
				                 atom_model_name=AtomicModel.MFD,
			
 
				-                mfd_weights=str(os.path.join(models_dir, self.configs["weights"][self.mfd_model_name]))
			
 
				+                mfd_weights=str(os.path.join(models_dir, self.configs["weights"][self.mfd_model_name])),
			
 
				+                device=self.device
			
 
				             )
			
 
				 
			
 
				             # 初始化公式解析模型
			
 
				             mfr_weight_dir = str(os.path.join(models_dir, self.configs["weights"][self.mfr_model_name]))
			
 
				             mfr_cfg_path = str(os.path.join(model_config_dir, "UniMERNet", "demo.yaml"))
			
 
				-            self.mfr_model, self.mfr_transform = atom_model_manager.get_atom_model(
			
 
				+            self.mfr_model = atom_model_manager.get_atom_model(
			
 
				                 atom_model_name=AtomicModel.MFR,
			
 
				                 mfr_weight_dir=mfr_weight_dir,
			
 
				                 mfr_cfg_path=mfr_cfg_path,
			
@@ -278,7 +112,8 @@ class CustomPEKModel:
 
				             self.layout_model = atom_model_manager.get_atom_model(
			
 
				                 atom_model_name=AtomicModel.Layout,
			
 
				                 layout_model_name=MODEL_NAME.DocLayout_YOLO,
			
 
				-                doclayout_yolo_weights=str(os.path.join(models_dir, self.configs['weights'][self.layout_model_name]))
			
 
				+                doclayout_yolo_weights=str(os.path.join(models_dir, self.configs['weights'][self.layout_model_name])),
			
 
				+                device=self.device
			
 
				             )
			
 
				         # 初始化ocr
			
 
				         if self.apply_ocr:
			
@@ -305,26 +140,15 @@ class CustomPEKModel:
 
				 
			
 
				         page_start = time.time()
			
 
				 
			
 
				-        latex_filling_list = []
			
 
				-        mf_image_list = []
			
 
				-
			
 
				         # layout检测
			
 
				         layout_start = time.time()
			
 
				+        layout_res = []
			
 
				         if self.layout_model_name == MODEL_NAME.LAYOUTLMv3:
			
 
				             # layoutlmv3
			
 
				             layout_res = self.layout_model(image, ignore_catids=[])
			
 
				         elif self.layout_model_name == MODEL_NAME.DocLayout_YOLO:
			
 
				             # doclayout_yolo
			
 
				-            layout_res = []
			
 
				-            doclayout_yolo_res = self.layout_model.predict(image, imgsz=1024, conf=0.25, iou=0.45, verbose=True, device=self.device)[0]
			
 
				-            for xyxy, conf, cla in zip(doclayout_yolo_res.boxes.xyxy.cpu(), doclayout_yolo_res.boxes.conf.cpu(), doclayout_yolo_res.boxes.cls.cpu()):
			
 
				-                xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
			
 
				-                new_item = {
			
 
				-                    'category_id': int(cla.item()),
			
 
				-                    'poly': [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
			
 
				-                    'score': round(float(conf.item()), 3),
			
 
				-                }
			
 
				-                layout_res.append(new_item)
			
 
				+            layout_res = self.layout_model.predict(image)
			
 
				         layout_cost = round(time.time() - layout_start, 2)
			
 
				         logger.info(f"layout detection time: {layout_cost}")
			
 
				 
			
@@ -333,59 +157,21 @@ class CustomPEKModel:
 
				         if self.apply_formula:
			
 
				             # 公式检测
			
 
				             mfd_start = time.time()
			
 
				-            mfd_res = self.mfd_model.predict(image, imgsz=1888, conf=0.25, iou=0.45, verbose=True, device=self.device)[0]
			
 
				+            mfd_res = self.mfd_model.predict(image)
			
 
				             logger.info(f"mfd time: {round(time.time() - mfd_start, 2)}")
			
 
				-            for xyxy, conf, cla in zip(mfd_res.boxes.xyxy.cpu(), mfd_res.boxes.conf.cpu(), mfd_res.boxes.cls.cpu()):
			
 
				-                xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
			
 
				-                new_item = {
			
 
				-                    'category_id': 13 + int(cla.item()),
			
 
				-                    'poly': [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
			
 
				-                    'score': round(float(conf.item()), 2),
			
 
				-                    'latex': '',
			
 
				-                }
			
 
				-                layout_res.append(new_item)
			
 
				-                latex_filling_list.append(new_item)
			
 
				-                bbox_img = pil_img.crop((xmin, ymin, xmax, ymax))
			
 
				-                mf_image_list.append(bbox_img)
			
 
				 
			
 
				             # 公式识别
			
 
				             mfr_start = time.time()
			
 
				-            dataset = MathDataset(mf_image_list, transform=self.mfr_transform)
			
 
				-            dataloader = DataLoader(dataset, batch_size=64, num_workers=0)
			
 
				-            mfr_res = []
			
 
				-            for mf_img in dataloader:
			
 
				-                mf_img = mf_img.to(self.device)
			
 
				-                with torch.no_grad():
			
 
				-                    output = self.mfr_model.generate({'image': mf_img})
			
 
				-                mfr_res.extend(output['pred_str'])
			
 
				-            for res, latex in zip(latex_filling_list, mfr_res):
			
 
				-                res['latex'] = latex_rm_whitespace(latex)
			
 
				+            formula_list = self.mfr_model.predict(mfd_res, image)
			
 
				+            layout_res.extend(formula_list)
			
 
				             mfr_cost = round(time.time() - mfr_start, 2)
			
 
				-            logger.info(f"formula nums: {len(mf_image_list)}, mfr time: {mfr_cost}")
			
 
				-
			
 
				-        # Select regions for OCR / formula regions / table regions
			
 
				-        ocr_res_list = []
			
 
				-        table_res_list = []
			
 
				-        single_page_mfdetrec_res = []
			
 
				-        for res in layout_res:
			
 
				-            if int(res['category_id']) in [13, 14]:
			
 
				-                single_page_mfdetrec_res.append({
			
 
				-                    "bbox": [int(res['poly'][0]), int(res['poly'][1]),
			
 
				-                             int(res['poly'][4]), int(res['poly'][5])],
			
 
				-                })
			
 
				-            elif int(res['category_id']) in [0, 1, 2, 4, 6, 7]:
			
 
				-                ocr_res_list.append(res)
			
 
				-            elif int(res['category_id']) in [5]:
			
 
				-                table_res_list.append(res)
			
 
				-
			
 
				-        if torch.cuda.is_available() and self.device != 'cpu':
			
 
				-            properties = torch.cuda.get_device_properties(self.device)
			
 
				-            total_memory = properties.total_memory / (1024 ** 3)  # 将字节转换为 GB
			
 
				-            if total_memory <= 10:
			
 
				-                gc_start = time.time()
			
 
				-                clean_memory()
			
 
				-                gc_time = round(time.time() - gc_start, 2)
			
 
				-                logger.info(f"gc time: {gc_time}")
			
 
				+            logger.info(f"formula nums: {len(formula_list)}, mfr time: {mfr_cost}")
			
 
				+
			
 
				+        # 清理显存
			
 
				+        clean_vram(self.device, vram_threshold=8)
			
 
				+
			
 
				+        # 从layout_res中获取ocr区域、表格区域、公式区域
			
 
				+        ocr_res_list, table_res_list, single_page_mfdetrec_res = get_res_list_from_layout_res(layout_res)
			
 
				 
			
 
				         # ocr识别
			
 
				         if self.apply_ocr:
			
@@ -393,23 +179,7 @@ class CustomPEKModel:
 
				             # Process each area that requires OCR processing
			
 
				             for res in ocr_res_list:
			
 
				                 new_image, useful_list = crop_img(res, pil_img, crop_paste_x=50, crop_paste_y=50)
			
 
				-                paste_x, paste_y, xmin, ymin, xmax, ymax, new_width, new_height = useful_list
			
 
				-                # Adjust the coordinates of the formula area
			
 
				-                adjusted_mfdetrec_res = []
			
 
				-                for mf_res in single_page_mfdetrec_res:
			
 
				-                    mf_xmin, mf_ymin, mf_xmax, mf_ymax = mf_res["bbox"]
			
 
				-                    # Adjust the coordinates of the formula area to the coordinates relative to the cropping area
			
 
				-                    x0 = mf_xmin - xmin + paste_x
			
 
				-                    y0 = mf_ymin - ymin + paste_y
			
 
				-                    x1 = mf_xmax - xmin + paste_x
			
 
				-                    y1 = mf_ymax - ymin + paste_y
			
 
				-                    # Filter formula blocks outside the graph
			
 
				-                    if any([x1 < 0, y1 < 0]) or any([x0 > new_width, y0 > new_height]):
			
 
				-                        continue
			
 
				-                    else:
			
 
				-                        adjusted_mfdetrec_res.append({
			
 
				-                            "bbox": [x0, y0, x1, y1],
			
 
				-                        })
			
 
				+                adjusted_mfdetrec_res = get_adjusted_mfdetrec_res(single_page_mfdetrec_res, useful_list)
			
 
				 
			
 
				                 # OCR recognition
			
 
				                 new_image = cv2.cvtColor(np.asarray(new_image), cv2.COLOR_RGB2BGR)
			
@@ -417,22 +187,8 @@ class CustomPEKModel:
 
				 
			
 
				                 # Integration results
			
 
				                 if ocr_res:
			
 
				-                    for box_ocr_res in ocr_res:
			
 
				-                        p1, p2, p3, p4 = box_ocr_res[0]
			
 
				-                        text, score = box_ocr_res[1]
			
 
				-
			
 
				-                        # Convert the coordinates back to the original coordinate system
			
 
				-                        p1 = [p1[0] - paste_x + xmin, p1[1] - paste_y + ymin]
			
 
				-                        p2 = [p2[0] - paste_x + xmin, p2[1] - paste_y + ymin]
			
 
				-                        p3 = [p3[0] - paste_x + xmin, p3[1] - paste_y + ymin]
			
 
				-                        p4 = [p4[0] - paste_x + xmin, p4[1] - paste_y + ymin]
			
 
				-
			
 
				-                        layout_res.append({
			
 
				-                            'category_id': 15,
			
 
				-                            'poly': p1 + p2 + p3 + p4,
			
 
				-                            'score': round(score, 2),
			
 
				-                            'text': text,
			
 
				-                        })
			
 
				+                    ocr_result_list = get_ocr_result_list(ocr_res, useful_list)
			
 
				+                    layout_res.extend(ocr_result_list)
			
 
				 
			
 
				             ocr_cost = round(time.time() - ocr_start, 2)
			
 
				             logger.info(f"ocr time: {ocr_cost}")
			
@@ -443,41 +199,30 @@ class CustomPEKModel:
 
				             for res in table_res_list:
			
 
				                 new_image, _ = crop_img(res, pil_img)
			
 
				                 single_table_start_time = time.time()
			
 
				-                # logger.info("------------------table recognition processing begins-----------------")
			
 
				-                latex_code = None
			
 
				                 html_code = None
			
 
				                 if self.table_model_name == MODEL_NAME.STRUCT_EQTABLE:
			
 
				                     with torch.no_grad():
			
 
				                         table_result = self.table_model.predict(new_image, "html")
			
 
				                         if len(table_result) > 0:
			
 
				                             html_code = table_result[0]
			
 
				-                else:
			
 
				+                elif self.table_model_name == MODEL_NAME.TABLE_MASTER:
			
 
				                     html_code = self.table_model.img2html(new_image)
			
 
				-
			
 
				+                elif self.table_model_name == MODEL_NAME.RAPID_TABLE:
			
 
				+                    html_code, table_cell_bboxes, elapse = self.table_model.predict(new_image)
			
 
				                 run_time = time.time() - single_table_start_time
			
 
				-                # logger.info(f"------------table recognition processing ends within {run_time}s-----")
			
 
				                 if run_time > self.table_max_time:
			
 
				-                    logger.warning(f"------------table recognition processing exceeds max time {self.table_max_time}s----------")
			
 
				+                    logger.warning(f"table recognition processing exceeds max time {self.table_max_time}s")
			
 
				                 # 判断是否返回正常
			
 
				-
			
 
				-                if latex_code:
			
 
				-                    expected_ending = latex_code.strip().endswith('end{tabular}') or latex_code.strip().endswith('end{table}')
			
 
				-                    if expected_ending:
			
 
				-                        res["latex"] = latex_code
			
 
				-                    else:
			
 
				-                        logger.warning(f"table recognition processing fails, not found expected LaTeX table end")
			
 
				-                elif html_code:
			
 
				+                if html_code:
			
 
				                     expected_ending = html_code.strip().endswith('</html>') or html_code.strip().endswith('</table>')
			
 
				                     if expected_ending:
			
 
				                         res["html"] = html_code
			
 
				                     else:
			
 
				                         logger.warning(f"table recognition processing fails, not found expected HTML table end")
			
 
				                 else:
			
 
				-                    logger.warning(f"table recognition processing fails, not get latex or html return")
			
 
				+                    logger.warning(f"table recognition processing fails, not get html return")
			
 
				             logger.info(f"table time: {round(time.time() - table_start, 2)}")
			
 
				 
			
 
				         logger.info(f"-----page total time: {round(time.time() - page_start, 2)}-----")
			
 
				 
			
 
				         return layout_res
			
 
				-
			
 
				-
			
--- a/magic_pdf/model/pek_sub_modules/post_process.py
+++ b/magic_pdf/model/pek_sub_modules/post_process.py
@@ -1,36 +0,0 @@
 
				-import re
			
 
				-
			
 
				-def layout_rm_equation(layout_res):
			
 
				-    rm_idxs = []
			
 
				-    for idx, ele in enumerate(layout_res['layout_dets']):
			
 
				-        if ele['category_id'] == 10:
			
 
				-            rm_idxs.append(idx)
			
 
				-    
			
 
				-    for idx in rm_idxs[::-1]:
			
 
				-        del layout_res['layout_dets'][idx]
			
 
				-    return layout_res
			
 
				-
			
 
				-
			
 
				-def get_croped_image(image_pil, bbox):
			
 
				-    x_min, y_min, x_max, y_max = bbox
			
 
				-    croped_img = image_pil.crop((x_min, y_min, x_max, y_max))
			
 
				-    return croped_img
			
 
				-
			
 
				-
			
 
				-def latex_rm_whitespace(s: str):
			
 
				-    """Remove unnecessary whitespace from LaTeX code.
			
 
				-    """
			
 
				-    text_reg = r'(\\(operatorname|mathrm|text|mathbf)\s?\*? {.*?})'
			
 
				-    letter = '[a-zA-Z]'
			
 
				-    noletter = '[\W_^\d]'
			
 
				-    names = [x[0].replace(' ', '') for x in re.findall(text_reg, s)]
			
 
				-    s = re.sub(text_reg, lambda match: str(names.pop(0)), s)
			
 
				-    news = s
			
 
				-    while True:
			
 
				-        s = news
			
 
				-        news = re.sub(r'(?!\\ )(%s)\s+?(%s)' % (noletter, noletter), r'\1\2', s)
			
 
				-        news = re.sub(r'(?!\\ )(%s)\s+?(%s)' % (noletter, letter), r'\1\2', news)
			
 
				-        news = re.sub(r'(%s)\s+?(%s)' % (letter, noletter), r'\1\2', news)
			
 
				-        if news == s:
			
 
				-            break
			
 
				-    return s
			
--- a/magic_pdf/model/pek_sub_modules/self_modify.py
+++ b/magic_pdf/model/pek_sub_modules/self_modify.py
@@ -1,388 +0,0 @@
 
				-import time
			
 
				-import copy
			
 
				-import base64
			
 
				-import cv2
			
 
				-import numpy as np
			
 
				-from io import BytesIO
			
 
				-from PIL import Image
			
 
				-
			
 
				-from paddleocr import PaddleOCR
			
 
				-from paddleocr.ppocr.utils.logging import get_logger
			
 
				-from paddleocr.ppocr.utils.utility import check_and_read, alpha_to_color, binarize_img
			
 
				-from paddleocr.tools.infer.utility import draw_ocr_box_txt, get_rotate_crop_image, get_minarea_rect_crop
			
 
				-
			
 
				-from magic_pdf.libs.boxbase import __is_overlaps_y_exceeds_threshold
			
 
				-from magic_pdf.pre_proc.ocr_dict_merge import merge_spans_to_line
			
 
				-
			
 
				-logger = get_logger()
			
 
				-
			
 
				-
			
 
				-def img_decode(content: bytes):
			
 
				-    np_arr = np.frombuffer(content, dtype=np.uint8)
			
 
				-    return cv2.imdecode(np_arr, cv2.IMREAD_UNCHANGED)
			
 
				-
			
 
				-
			
 
				-def check_img(img):
			
 
				-    if isinstance(img, bytes):
			
 
				-        img = img_decode(img)
			
 
				-    if isinstance(img, str):
			
 
				-        image_file = img
			
 
				-        img, flag_gif, flag_pdf = check_and_read(image_file)
			
 
				-        if not flag_gif and not flag_pdf:
			
 
				-            with open(image_file, 'rb') as f:
			
 
				-                img_str = f.read()
			
 
				-                img = img_decode(img_str)
			
 
				-            if img is None:
			
 
				-                try:
			
 
				-                    buf = BytesIO()
			
 
				-                    image = BytesIO(img_str)
			
 
				-                    im = Image.open(image)
			
 
				-                    rgb = im.convert('RGB')
			
 
				-                    rgb.save(buf, 'jpeg')
			
 
				-                    buf.seek(0)
			
 
				-                    image_bytes = buf.read()
			
 
				-                    data_base64 = str(base64.b64encode(image_bytes),
			
 
				-                                      encoding="utf-8")
			
 
				-                    image_decode = base64.b64decode(data_base64)
			
 
				-                    img_array = np.frombuffer(image_decode, np.uint8)
			
 
				-                    img = cv2.imdecode(img_array, cv2.IMREAD_COLOR)
			
 
				-                except:
			
 
				-                    logger.error("error in loading image:{}".format(image_file))
			
 
				-                    return None
			
 
				-        if img is None:
			
 
				-            logger.error("error in loading image:{}".format(image_file))
			
 
				-            return None
			
 
				-    if isinstance(img, np.ndarray) and len(img.shape) == 2:
			
 
				-        img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
			
 
				-
			
 
				-    return img
			
 
				-
			
 
				-
			
 
				-def sorted_boxes(dt_boxes):
			
 
				-    """
			
 
				-    Sort text boxes in order from top to bottom, left to right
			
 
				-    args:
			
 
				-        dt_boxes(array):detected text boxes with shape [4, 2]
			
 
				-    return:
			
 
				-        sorted boxes(array) with shape [4, 2]
			
 
				-    """
			
 
				-    num_boxes = dt_boxes.shape[0]
			
 
				-    sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
			
 
				-    _boxes = list(sorted_boxes)
			
 
				-
			
 
				-    for i in range(num_boxes - 1):
			
 
				-        for j in range(i, -1, -1):
			
 
				-            if abs(_boxes[j + 1][0][1] - _boxes[j][0][1]) < 10 and \
			
 
				-                    (_boxes[j + 1][0][0] < _boxes[j][0][0]):
			
 
				-                tmp = _boxes[j]
			
 
				-                _boxes[j] = _boxes[j + 1]
			
 
				-                _boxes[j + 1] = tmp
			
 
				-            else:
			
 
				-                break
			
 
				-    return _boxes
			
 
				-
			
 
				-
			
 
				-def bbox_to_points(bbox):
			
 
				-    """ 将bbox格式转换为四个顶点的数组 """
			
 
				-    x0, y0, x1, y1 = bbox
			
 
				-    return np.array([[x0, y0], [x1, y0], [x1, y1], [x0, y1]]).astype('float32')
			
 
				-
			
 
				-
			
 
				-def points_to_bbox(points):
			
 
				-    """ 将四个顶点的数组转换为bbox格式 """
			
 
				-    x0, y0 = points[0]
			
 
				-    x1, _ = points[1]
			
 
				-    _, y1 = points[2]
			
 
				-    return [x0, y0, x1, y1]
			
 
				-
			
 
				-
			
 
				-def merge_intervals(intervals):
			
 
				-    # Sort the intervals based on the start value
			
 
				-    intervals.sort(key=lambda x: x[0])
			
 
				-
			
 
				-    merged = []
			
 
				-    for interval in intervals:
			
 
				-        # If the list of merged intervals is empty or if the current
			
 
				-        # interval does not overlap with the previous, simply append it.
			
 
				-        if not merged or merged[-1][1] < interval[0]:
			
 
				-            merged.append(interval)
			
 
				-        else:
			
 
				-            # Otherwise, there is overlap, so we merge the current and previous intervals.
			
 
				-            merged[-1][1] = max(merged[-1][1], interval[1])
			
 
				-
			
 
				-    return merged
			
 
				-
			
 
				-
			
 
				-def remove_intervals(original, masks):
			
 
				-    # Merge all mask intervals
			
 
				-    merged_masks = merge_intervals(masks)
			
 
				-
			
 
				-    result = []
			
 
				-    original_start, original_end = original
			
 
				-
			
 
				-    for mask in merged_masks:
			
 
				-        mask_start, mask_end = mask
			
 
				-
			
 
				-        # If the mask starts after the original range, ignore it
			
 
				-        if mask_start > original_end:
			
 
				-            continue
			
 
				-
			
 
				-        # If the mask ends before the original range starts, ignore it
			
 
				-        if mask_end < original_start:
			
 
				-            continue
			
 
				-
			
 
				-        # Remove the masked part from the original range
			
 
				-        if original_start < mask_start:
			
 
				-            result.append([original_start, mask_start - 1])
			
 
				-
			
 
				-        original_start = max(mask_end + 1, original_start)
			
 
				-
			
 
				-    # Add the remaining part of the original range, if any
			
 
				-    if original_start <= original_end:
			
 
				-        result.append([original_start, original_end])
			
 
				-
			
 
				-    return result
			
 
				-
			
 
				-
			
 
				-def update_det_boxes(dt_boxes, mfd_res):
			
 
				-    new_dt_boxes = []
			
 
				-    for text_box in dt_boxes:
			
 
				-        text_bbox = points_to_bbox(text_box)
			
 
				-        masks_list = []
			
 
				-        for mf_box in mfd_res:
			
 
				-            mf_bbox = mf_box['bbox']
			
 
				-            if __is_overlaps_y_exceeds_threshold(text_bbox, mf_bbox):
			
 
				-                masks_list.append([mf_bbox[0], mf_bbox[2]])
			
 
				-        text_x_range = [text_bbox[0], text_bbox[2]]
			
 
				-        text_remove_mask_range = remove_intervals(text_x_range, masks_list)
			
 
				-        temp_dt_box = []
			
 
				-        for text_remove_mask in text_remove_mask_range:
			
 
				-            temp_dt_box.append(bbox_to_points([text_remove_mask[0], text_bbox[1], text_remove_mask[1], text_bbox[3]]))
			
 
				-        if len(temp_dt_box) > 0:
			
 
				-            new_dt_boxes.extend(temp_dt_box)
			
 
				-    return new_dt_boxes
			
 
				-
			
 
				-
			
 
				-def merge_overlapping_spans(spans):
			
 
				-    """
			
 
				-    Merges overlapping spans on the same line.
			
 
				-
			
 
				-    :param spans: A list of span coordinates [(x1, y1, x2, y2), ...]
			
 
				-    :return: A list of merged spans
			
 
				-    """
			
 
				-    # Return an empty list if the input spans list is empty
			
 
				-    if not spans:
			
 
				-        return []
			
 
				-
			
 
				-    # Sort spans by their starting x-coordinate
			
 
				-    spans.sort(key=lambda x: x[0])
			
 
				-
			
 
				-    # Initialize the list of merged spans
			
 
				-    merged = []
			
 
				-    for span in spans:
			
 
				-        # Unpack span coordinates
			
 
				-        x1, y1, x2, y2 = span
			
 
				-        # If the merged list is empty or there's no horizontal overlap, add the span directly
			
 
				-        if not merged or merged[-1][2] < x1:
			
 
				-            merged.append(span)
			
 
				-        else:
			
 
				-            # If there is horizontal overlap, merge the current span with the previous one
			
 
				-            last_span = merged.pop()
			
 
				-            # Update the merged span's top-left corner to the smaller (x1, y1) and bottom-right to the larger (x2, y2)
			
 
				-            x1 = min(last_span[0], x1)
			
 
				-            y1 = min(last_span[1], y1)
			
 
				-            x2 = max(last_span[2], x2)
			
 
				-            y2 = max(last_span[3], y2)
			
 
				-            # Add the merged span back to the list
			
 
				-            merged.append((x1, y1, x2, y2))
			
 
				-
			
 
				-    # Return the list of merged spans
			
 
				-    return merged
			
 
				-
			
 
				-
			
 
				-def merge_det_boxes(dt_boxes):
			
 
				-    """
			
 
				-    Merge detection boxes.
			
 
				-
			
 
				-    This function takes a list of detected bounding boxes, each represented by four corner points.
			
 
				-    The goal is to merge these bounding boxes into larger text regions.
			
 
				-
			
 
				-    Parameters:
			
 
				-    dt_boxes (list): A list containing multiple text detection boxes, where each box is defined by four corner points.
			
 
				-
			
 
				-    Returns:
			
 
				-    list: A list containing the merged text regions, where each region is represented by four corner points.
			
 
				-    """
			
 
				-    # Convert the detection boxes into a dictionary format with bounding boxes and type
			
 
				-    dt_boxes_dict_list = []
			
 
				-    for text_box in dt_boxes:
			
 
				-        text_bbox = points_to_bbox(text_box)
			
 
				-        text_box_dict = {
			
 
				-            'bbox': text_bbox,
			
 
				-            'type': 'text',
			
 
				-        }
			
 
				-        dt_boxes_dict_list.append(text_box_dict)
			
 
				-
			
 
				-    # Merge adjacent text regions into lines
			
 
				-    lines = merge_spans_to_line(dt_boxes_dict_list)
			
 
				-
			
 
				-    # Initialize a new list for storing the merged text regions
			
 
				-    new_dt_boxes = []
			
 
				-    for line in lines:
			
 
				-        line_bbox_list = []
			
 
				-        for span in line:
			
 
				-            line_bbox_list.append(span['bbox'])
			
 
				-
			
 
				-        # Merge overlapping text regions within the same line
			
 
				-        merged_spans = merge_overlapping_spans(line_bbox_list)
			
 
				-
			
 
				-        # Convert the merged text regions back to point format and add them to the new detection box list
			
 
				-        for span in merged_spans:
			
 
				-            new_dt_boxes.append(bbox_to_points(span))
			
 
				-
			
 
				-    return new_dt_boxes
			
 
				-
			
 
				-
			
 
				-class ModifiedPaddleOCR(PaddleOCR):
			
 
				-    def ocr(self, img, det=True, rec=True, cls=True, bin=False, inv=False, mfd_res=None, alpha_color=(255, 255, 255)):
			
 
				-        """
			
 
				-        OCR with PaddleOCR
			
 
				-        args：
			
 
				-            img: img for OCR, support ndarray, img_path and list or ndarray
			
 
				-            det: use text detection or not. If False, only rec will be exec. Default is True
			
 
				-            rec: use text recognition or not. If False, only det will be exec. Default is True
			
 
				-            cls: use angle classifier or not. Default is True. If True, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
			
 
				-            bin: binarize image to black and white. Default is False.
			
 
				-            inv: invert image colors. Default is False.
			
 
				-            alpha_color: set RGB color Tuple for transparent parts replacement. Default is pure white.
			
 
				-        """
			
 
				-        assert isinstance(img, (np.ndarray, list, str, bytes))
			
 
				-        if isinstance(img, list) and det == True:
			
 
				-            logger.error('When input a list of images, det must be false')
			
 
				-            exit(0)
			
 
				-        if cls == True and self.use_angle_cls == False:
			
 
				-            pass
			
 
				-            # logger.warning(
			
 
				-            #     'Since the angle classifier is not initialized, it will not be used during the forward process'
			
 
				-            # )
			
 
				-
			
 
				-        img = check_img(img)
			
 
				-        # for infer pdf file
			
 
				-        if isinstance(img, list):
			
 
				-            if self.page_num > len(img) or self.page_num == 0:
			
 
				-                self.page_num = len(img)
			
 
				-            imgs = img[:self.page_num]
			
 
				-        else:
			
 
				-            imgs = [img]
			
 
				-
			
 
				-        def preprocess_image(_image):
			
 
				-            _image = alpha_to_color(_image, alpha_color)
			
 
				-            if inv:
			
 
				-                _image = cv2.bitwise_not(_image)
			
 
				-            if bin:
			
 
				-                _image = binarize_img(_image)
			
 
				-            return _image
			
 
				-
			
 
				-        if det and rec:
			
 
				-            ocr_res = []
			
 
				-            for idx, img in enumerate(imgs):
			
 
				-                img = preprocess_image(img)
			
 
				-                dt_boxes, rec_res, _ = self.__call__(img, cls, mfd_res=mfd_res)
			
 
				-                if not dt_boxes and not rec_res:
			
 
				-                    ocr_res.append(None)
			
 
				-                    continue
			
 
				-                tmp_res = [[box.tolist(), res]
			
 
				-                           for box, res in zip(dt_boxes, rec_res)]
			
 
				-                ocr_res.append(tmp_res)
			
 
				-            return ocr_res
			
 
				-        elif det and not rec:
			
 
				-            ocr_res = []
			
 
				-            for idx, img in enumerate(imgs):
			
 
				-                img = preprocess_image(img)
			
 
				-                dt_boxes, elapse = self.text_detector(img)
			
 
				-                if not dt_boxes:
			
 
				-                    ocr_res.append(None)
			
 
				-                    continue
			
 
				-                tmp_res = [box.tolist() for box in dt_boxes]
			
 
				-                ocr_res.append(tmp_res)
			
 
				-            return ocr_res
			
 
				-        else:
			
 
				-            ocr_res = []
			
 
				-            cls_res = []
			
 
				-            for idx, img in enumerate(imgs):
			
 
				-                if not isinstance(img, list):
			
 
				-                    img = preprocess_image(img)
			
 
				-                    img = [img]
			
 
				-                if self.use_angle_cls and cls:
			
 
				-                    img, cls_res_tmp, elapse = self.text_classifier(img)
			
 
				-                    if not rec:
			
 
				-                        cls_res.append(cls_res_tmp)
			
 
				-                rec_res, elapse = self.text_recognizer(img)
			
 
				-                ocr_res.append(rec_res)
			
 
				-            if not rec:
			
 
				-                return cls_res
			
 
				-            return ocr_res
			
 
				-
			
 
				-    def __call__(self, img, cls=True, mfd_res=None):
			
 
				-        time_dict = {'det': 0, 'rec': 0, 'cls': 0, 'all': 0}
			
 
				-
			
 
				-        if img is None:
			
 
				-            logger.debug("no valid image provided")
			
 
				-            return None, None, time_dict
			
 
				-
			
 
				-        start = time.time()
			
 
				-        ori_im = img.copy()
			
 
				-        dt_boxes, elapse = self.text_detector(img)
			
 
				-        time_dict['det'] = elapse
			
 
				-
			
 
				-        if dt_boxes is None:
			
 
				-            logger.debug("no dt_boxes found, elapsed : {}".format(elapse))
			
 
				-            end = time.time()
			
 
				-            time_dict['all'] = end - start
			
 
				-            return None, None, time_dict
			
 
				-        else:
			
 
				-            logger.debug("dt_boxes num : {}, elapsed : {}".format(
			
 
				-                len(dt_boxes), elapse))
			
 
				-        img_crop_list = []
			
 
				-
			
 
				-        dt_boxes = sorted_boxes(dt_boxes)
			
 
				-
			
 
				-        dt_boxes = merge_det_boxes(dt_boxes)
			
 
				-
			
 
				-        if mfd_res:
			
 
				-            bef = time.time()
			
 
				-            dt_boxes = update_det_boxes(dt_boxes, mfd_res)
			
 
				-            aft = time.time()
			
 
				-            logger.debug("split text box by formula, new dt_boxes num : {}, elapsed : {}".format(
			
 
				-                len(dt_boxes), aft - bef))
			
 
				-
			
 
				-        for bno in range(len(dt_boxes)):
			
 
				-            tmp_box = copy.deepcopy(dt_boxes[bno])
			
 
				-            if self.args.det_box_type == "quad":
			
 
				-                img_crop = get_rotate_crop_image(ori_im, tmp_box)
			
 
				-            else:
			
 
				-                img_crop = get_minarea_rect_crop(ori_im, tmp_box)
			
 
				-            img_crop_list.append(img_crop)
			
 
				-        if self.use_angle_cls and cls:
			
 
				-            img_crop_list, angle_list, elapse = self.text_classifier(
			
 
				-                img_crop_list)
			
 
				-            time_dict['cls'] = elapse
			
 
				-            logger.debug("cls num  : {}, elapsed : {}".format(
			
 
				-                len(img_crop_list), elapse))
			
 
				-
			
 
				-        rec_res, elapse = self.text_recognizer(img_crop_list)
			
 
				-        time_dict['rec'] = elapse
			
 
				-        logger.debug("rec_res num  : {}, elapsed : {}".format(
			
 
				-            len(rec_res), elapse))
			
 
				-        if self.args.save_crop_res:
			
 
				-            self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list,
			
 
				-                                   rec_res)
			
 
				-        filter_boxes, filter_rec_res = [], []
			
 
				-        for box, rec_result in zip(dt_boxes, rec_res):
			
 
				-            text, score = rec_result
			
 
				-            if score >= self.drop_score:
			
 
				-                filter_boxes.append(box)
			
 
				-                filter_rec_res.append(rec_result)
			
 
				-        end = time.time()
			
 
				-        time_dict['all'] = end - start
			
 
				-        return filter_boxes, filter_rec_res, time_dict
			
--- a/magic_pdf/model/pek_sub_modules/__init__.py
+++ b/magic_pdf/model/pek_sub_modules/__init__.py
--- a/magic_pdf/model/pek_sub_modules/layoutlmv3/__init__.py
+++ b/magic_pdf/model/pek_sub_modules/layoutlmv3/__init__.py
--- a/magic_pdf/model/sub_modules/layout/doclayout_yolo/DocLayoutYOLO.py
+++ b/magic_pdf/model/sub_modules/layout/doclayout_yolo/DocLayoutYOLO.py
@@ -0,0 +1,21 @@
 
				+from doclayout_yolo import YOLOv10
			
 
				+
			
 
				+
			
 
				+class DocLayoutYOLOModel(object):
			
 
				+    def __init__(self, weight, device):
			
 
				+        self.model = YOLOv10(weight)
			
 
				+        self.device = device
			
 
				+
			
 
				+    def predict(self, image):
			
 
				+        layout_res = []
			
 
				+        doclayout_yolo_res = self.model.predict(image, imgsz=1024, conf=0.25, iou=0.45, verbose=True, device=self.device)[0]
			
 
				+        for xyxy, conf, cla in zip(doclayout_yolo_res.boxes.xyxy.cpu(), doclayout_yolo_res.boxes.conf.cpu(),
			
 
				+                                   doclayout_yolo_res.boxes.cls.cpu()):
			
 
				+            xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
			
 
				+            new_item = {
			
 
				+                'category_id': int(cla.item()),
			
 
				+                'poly': [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
			
 
				+                'score': round(float(conf.item()), 3),
			
 
				+            }
			
 
				+            layout_res.append(new_item)
			
 
				+        return layout_res
			
--- a/magic_pdf/model/sub_modules/layout/doclayout_yolo/__init__.py
+++ b/magic_pdf/model/sub_modules/layout/doclayout_yolo/__init__.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/__init__.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/__init__.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/backbone.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/backbone.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/beit.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/beit.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/deit.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/deit.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/__init__.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/__init__.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/__init__.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/__init__.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/cord.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/cord.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/data_collator.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/data_collator.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/funsd.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/funsd.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/image_utils.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/image_utils.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/xfund.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/data/xfund.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/__init__.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/__init__.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/__init__.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/__init__.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/configuration_layoutlmv3.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/configuration_layoutlmv3.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/tokenization_layoutlmv3.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/tokenization_layoutlmv3.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/tokenization_layoutlmv3_fast.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/layoutlmft/models/layoutlmv3/tokenization_layoutlmv3_fast.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/model_init.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/model_init.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/rcnn_vl.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/rcnn_vl.py
--- a/magic_pdf/model/sub_modules/layout/layoutlmv3/visualizer.py
+++ b/magic_pdf/model/sub_modules/layout/layoutlmv3/visualizer.py
--- a/magic_pdf/model/sub_modules/mfd/__init__.py
+++ b/magic_pdf/model/sub_modules/mfd/__init__.py
--- a/magic_pdf/model/sub_modules/mfd/yolov8/YOLOv8.py
+++ b/magic_pdf/model/sub_modules/mfd/yolov8/YOLOv8.py
@@ -0,0 +1,12 @@
 
				+from ultralytics import YOLO
			
 
				+
			
 
				+
			
 
				+class YOLOv8MFDModel(object):
			
 
				+    def __init__(self, weight, device='cpu'):
			
 
				+        self.mfd_model = YOLO(weight)
			
 
				+        self.device = device
			
 
				+
			
 
				+    def predict(self, image):
			
 
				+        mfd_res = self.mfd_model.predict(image, imgsz=1888, conf=0.25, iou=0.45, verbose=True, device=self.device)[0]
			
 
				+        return mfd_res
			
 
				+
			
--- a/magic_pdf/model/sub_modules/mfd/yolov8/__init__.py
+++ b/magic_pdf/model/sub_modules/mfd/yolov8/__init__.py
--- a/magic_pdf/model/sub_modules/mfr/__init__.py
+++ b/magic_pdf/model/sub_modules/mfr/__init__.py
--- a/magic_pdf/model/sub_modules/mfr/unimernet/Unimernet.py
+++ b/magic_pdf/model/sub_modules/mfr/unimernet/Unimernet.py
@@ -0,0 +1,98 @@
 
				+import os
			
 
				+import argparse
			
 
				+import re
			
 
				+
			
 
				+from PIL import Image
			
 
				+import torch
			
 
				+from torch.utils.data import Dataset, DataLoader
			
 
				+from torchvision import transforms
			
 
				+from unimernet.common.config import Config
			
 
				+import unimernet.tasks as tasks
			
 
				+from unimernet.processors import load_processor
			
 
				+
			
 
				+
			
 
				+class MathDataset(Dataset):
			
 
				+    def __init__(self, image_paths, transform=None):
			
 
				+        self.image_paths = image_paths
			
 
				+        self.transform = transform
			
 
				+
			
 
				+    def __len__(self):
			
 
				+        return len(self.image_paths)
			
 
				+
			
 
				+    def __getitem__(self, idx):
			
 
				+        # if not pil image, then convert to pil image
			
 
				+        if isinstance(self.image_paths[idx], str):
			
 
				+            raw_image = Image.open(self.image_paths[idx])
			
 
				+        else:
			
 
				+            raw_image = self.image_paths[idx]
			
 
				+        if self.transform:
			
 
				+            image = self.transform(raw_image)
			
 
				+            return image
			
 
				+
			
 
				+
			
 
				+def latex_rm_whitespace(s: str):
			
 
				+    """Remove unnecessary whitespace from LaTeX code.
			
 
				+    """
			
 
				+    text_reg = r'(\\(operatorname|mathrm|text|mathbf)\s?\*? {.*?})'
			
 
				+    letter = '[a-zA-Z]'
			
 
				+    noletter = '[\W_^\d]'
			
 
				+    names = [x[0].replace(' ', '') for x in re.findall(text_reg, s)]
			
 
				+    s = re.sub(text_reg, lambda match: str(names.pop(0)), s)
			
 
				+    news = s
			
 
				+    while True:
			
 
				+        s = news
			
 
				+        news = re.sub(r'(?!\\ )(%s)\s+?(%s)' % (noletter, noletter), r'\1\2', s)
			
 
				+        news = re.sub(r'(?!\\ )(%s)\s+?(%s)' % (noletter, letter), r'\1\2', news)
			
 
				+        news = re.sub(r'(%s)\s+?(%s)' % (letter, noletter), r'\1\2', news)
			
 
				+        if news == s:
			
 
				+            break
			
 
				+    return s
			
 
				+
			
 
				+
			
 
				+class UnimernetModel(object):
			
 
				+    def __init__(self, weight_dir, cfg_path, _device_='cpu'):
			
 
				+
			
 
				+        args = argparse.Namespace(cfg_path=cfg_path, options=None)
			
 
				+        cfg = Config(args)
			
 
				+        cfg.config.model.pretrained = os.path.join(weight_dir, "pytorch_model.pth")
			
 
				+        cfg.config.model.model_config.model_name = weight_dir
			
 
				+        cfg.config.model.tokenizer_config.path = weight_dir
			
 
				+        task = tasks.setup_task(cfg)
			
 
				+        self.model = task.build_model(cfg)
			
 
				+        self.device = _device_
			
 
				+        self.model.to(_device_)
			
 
				+        self.model.eval()
			
 
				+        vis_processor = load_processor('formula_image_eval', cfg.config.datasets.formula_rec_eval.vis_processor.eval)
			
 
				+        self.mfr_transform = transforms.Compose([vis_processor, ])
			
 
				+
			
 
				+    def predict(self, mfd_res, image):
			
 
				+
			
 
				+        formula_list = []
			
 
				+        mf_image_list = []
			
 
				+        for xyxy, conf, cla in zip(mfd_res.boxes.xyxy.cpu(), mfd_res.boxes.conf.cpu(), mfd_res.boxes.cls.cpu()):
			
 
				+            xmin, ymin, xmax, ymax = [int(p.item()) for p in xyxy]
			
 
				+            new_item = {
			
 
				+                'category_id': 13 + int(cla.item()),
			
 
				+                'poly': [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax],
			
 
				+                'score': round(float(conf.item()), 2),
			
 
				+                'latex': '',
			
 
				+            }
			
 
				+            formula_list.append(new_item)
			
 
				+            pil_img = Image.fromarray(image)
			
 
				+            bbox_img = pil_img.crop((xmin, ymin, xmax, ymax))
			
 
				+            mf_image_list.append(bbox_img)
			
 
				+
			
 
				+        dataset = MathDataset(mf_image_list, transform=self.mfr_transform)
			
 
				+        dataloader = DataLoader(dataset, batch_size=64, num_workers=0)
			
 
				+        mfr_res = []
			
 
				+        for mf_img in dataloader:
			
 
				+            mf_img = mf_img.to(self.device)
			
 
				+            with torch.no_grad():
			
 
				+                output = self.model.generate({'image': mf_img})
			
 
				+            mfr_res.extend(output['pred_str'])
			
 
				+        for res, latex in zip(formula_list, mfr_res):
			
 
				+            res['latex'] = latex_rm_whitespace(latex)
			
 
				+        return formula_list
			
 
				+
			
 
				+
			
 
				+
			
--- a/magic_pdf/model/sub_modules/mfr/unimernet/__init__.py
+++ b/magic_pdf/model/sub_modules/mfr/unimernet/__init__.py
--- a/magic_pdf/model/sub_modules/model_init.py
+++ b/magic_pdf/model/sub_modules/model_init.py
@@ -0,0 +1,144 @@
 
				+from loguru import logger
			
 
				+
			
 
				+from magic_pdf.libs.Constants import MODEL_NAME
			
 
				+from magic_pdf.model.model_list import AtomicModel
			
 
				+from magic_pdf.model.sub_modules.layout.doclayout_yolo.DocLayoutYOLO import DocLayoutYOLOModel
			
 
				+from magic_pdf.model.sub_modules.layout.layoutlmv3.model_init import Layoutlmv3_Predictor
			
 
				+from magic_pdf.model.sub_modules.mfd.yolov8.YOLOv8 import YOLOv8MFDModel
			
 
				+
			
 
				+from magic_pdf.model.sub_modules.mfr.unimernet.Unimernet import UnimernetModel
			
 
				+from magic_pdf.model.sub_modules.ocr.paddleocr.ppocr_273_mod import ModifiedPaddleOCR
			
 
				+# from magic_pdf.model.sub_modules.ocr.paddleocr.ppocr_291_mod import ModifiedPaddleOCR
			
 
				+from magic_pdf.model.sub_modules.table.structeqtable.struct_eqtable import StructTableModel
			
 
				+from magic_pdf.model.sub_modules.table.tablemaster.tablemaster_paddle import TableMasterPaddleModel
			
 
				+from magic_pdf.model.sub_modules.table.rapidtable.rapid_table import RapidTableModel
			
 
				+
			
 
				+
			
 
				+def table_model_init(table_model_type, model_path, max_time, _device_='cpu'):
			
 
				+    if table_model_type == MODEL_NAME.STRUCT_EQTABLE:
			
 
				+        table_model = StructTableModel(model_path, max_new_tokens=2048, max_time=max_time)
			
 
				+    elif table_model_type == MODEL_NAME.TABLE_MASTER:
			
 
				+        config = {
			
 
				+            "model_dir": model_path,
			
 
				+            "device": _device_
			
 
				+        }
			
 
				+        table_model = TableMasterPaddleModel(config)
			
 
				+    elif table_model_type == MODEL_NAME.RAPID_TABLE:
			
 
				+        table_model = RapidTableModel()
			
 
				+    else:
			
 
				+        logger.error("table model type not allow")
			
 
				+        exit(1)
			
 
				+
			
 
				+    return table_model
			
 
				+
			
 
				+
			
 
				+def mfd_model_init(weight, device='cpu'):
			
 
				+    mfd_model = YOLOv8MFDModel(weight, device)
			
 
				+    return mfd_model
			
 
				+
			
 
				+
			
 
				+def mfr_model_init(weight_dir, cfg_path, device='cpu'):
			
 
				+    mfr_model = UnimernetModel(weight_dir, cfg_path, device)
			
 
				+    return mfr_model
			
 
				+
			
 
				+
			
 
				+def layout_model_init(weight, config_file, device):
			
 
				+    model = Layoutlmv3_Predictor(weight, config_file, device)
			
 
				+    return model
			
 
				+
			
 
				+
			
 
				+def doclayout_yolo_model_init(weight, device='cpu'):
			
 
				+    model = DocLayoutYOLOModel(weight, device)
			
 
				+    return model
			
 
				+
			
 
				+
			
 
				+def ocr_model_init(show_log: bool = False,
			
 
				+                   det_db_box_thresh=0.3,
			
 
				+                   lang=None,
			
 
				+                   use_dilation=True,
			
 
				+                   det_db_unclip_ratio=1.8,
			
 
				+                   ):
			
 
				+    if lang is not None:
			
 
				+        model = ModifiedPaddleOCR(
			
 
				+            show_log=show_log,
			
 
				+            det_db_box_thresh=det_db_box_thresh,
			
 
				+            lang=lang,
			
 
				+            use_dilation=use_dilation,
			
 
				+            det_db_unclip_ratio=det_db_unclip_ratio,
			
 
				+        )
			
 
				+    else:
			
 
				+        model = ModifiedPaddleOCR(
			
 
				+            show_log=show_log,
			
 
				+            det_db_box_thresh=det_db_box_thresh,
			
 
				+            use_dilation=use_dilation,
			
 
				+            det_db_unclip_ratio=det_db_unclip_ratio,
			
 
				+            # use_angle_cls=True,
			
 
				+        )
			
 
				+    return model
			
 
				+
			
 
				+
			
 
				+class AtomModelSingleton:
			
 
				+    _instance = None
			
 
				+    _models = {}
			
 
				+
			
 
				+    def __new__(cls, *args, **kwargs):
			
 
				+        if cls._instance is None:
			
 
				+            cls._instance = super().__new__(cls)
			
 
				+        return cls._instance
			
 
				+
			
 
				+    def get_atom_model(self, atom_model_name: str, **kwargs):
			
 
				+        lang = kwargs.get("lang", None)
			
 
				+        layout_model_name = kwargs.get("layout_model_name", None)
			
 
				+        key = (atom_model_name, layout_model_name, lang)
			
 
				+        if key not in self._models:
			
 
				+            self._models[key] = atom_model_init(model_name=atom_model_name, **kwargs)
			
 
				+        return self._models[key]
			
 
				+
			
 
				+
			
 
				+def atom_model_init(model_name: str, **kwargs):
			
 
				+    atom_model = None
			
 
				+    if model_name == AtomicModel.Layout:
			
 
				+        if kwargs.get("layout_model_name") == MODEL_NAME.LAYOUTLMv3:
			
 
				+            atom_model = layout_model_init(
			
 
				+                kwargs.get("layout_weights"),
			
 
				+                kwargs.get("layout_config_file"),
			
 
				+                kwargs.get("device")
			
 
				+            )
			
 
				+        elif kwargs.get("layout_model_name") == MODEL_NAME.DocLayout_YOLO:
			
 
				+            atom_model = doclayout_yolo_model_init(
			
 
				+                kwargs.get("doclayout_yolo_weights"),
			
 
				+                kwargs.get("device")
			
 
				+            )
			
 
				+    elif model_name == AtomicModel.MFD:
			
 
				+        atom_model = mfd_model_init(
			
 
				+            kwargs.get("mfd_weights"),
			
 
				+            kwargs.get("device")
			
 
				+        )
			
 
				+    elif model_name == AtomicModel.MFR:
			
 
				+        atom_model = mfr_model_init(
			
 
				+            kwargs.get("mfr_weight_dir"),
			
 
				+            kwargs.get("mfr_cfg_path"),
			
 
				+            kwargs.get("device")
			
 
				+        )
			
 
				+    elif model_name == AtomicModel.OCR:
			
 
				+        atom_model = ocr_model_init(
			
 
				+            kwargs.get("ocr_show_log"),
			
 
				+            kwargs.get("det_db_box_thresh"),
			
 
				+            kwargs.get("lang")
			
 
				+        )
			
 
				+    elif model_name == AtomicModel.Table:
			
 
				+        atom_model = table_model_init(
			
 
				+            kwargs.get("table_model_name"),
			
 
				+            kwargs.get("table_model_path"),
			
 
				+            kwargs.get("table_max_time"),
			
 
				+            kwargs.get("device")
			
 
				+        )
			
 
				+    else:
			
 
				+        logger.error("model name not allow")
			
 
				+        exit(1)
			
 
				+
			
 
				+    if atom_model is None:
			
 
				+        logger.error("model init failed")
			
 
				+        exit(1)
			
 
				+    else:
			
 
				+        return atom_model
			
--- a/magic_pdf/model/sub_modules/model_utils.py
+++ b/magic_pdf/model/sub_modules/model_utils.py
@@ -0,0 +1,51 @@
 
				+import time
			
 
				+
			
 
				+import torch
			
 
				+from PIL import Image
			
 
				+from loguru import logger
			
 
				+
			
 
				+from magic_pdf.libs.clean_memory import clean_memory
			
 
				+
			
 
				+
			
 
				+def crop_img(input_res, input_pil_img, crop_paste_x=0, crop_paste_y=0):
			
 
				+    crop_xmin, crop_ymin = int(input_res['poly'][0]), int(input_res['poly'][1])
			
 
				+    crop_xmax, crop_ymax = int(input_res['poly'][4]), int(input_res['poly'][5])
			
 
				+    # Create a white background with an additional width and height of 50
			
 
				+    crop_new_width = crop_xmax - crop_xmin + crop_paste_x * 2
			
 
				+    crop_new_height = crop_ymax - crop_ymin + crop_paste_y * 2
			
 
				+    return_image = Image.new('RGB', (crop_new_width, crop_new_height), 'white')
			
 
				+
			
 
				+    # Crop image
			
 
				+    crop_box = (crop_xmin, crop_ymin, crop_xmax, crop_ymax)
			
 
				+    cropped_img = input_pil_img.crop(crop_box)
			
 
				+    return_image.paste(cropped_img, (crop_paste_x, crop_paste_y))
			
 
				+    return_list = [crop_paste_x, crop_paste_y, crop_xmin, crop_ymin, crop_xmax, crop_ymax, crop_new_width, crop_new_height]
			
 
				+    return return_image, return_list
			
 
				+
			
 
				+
			
 
				+# Select regions for OCR / formula regions / table regions
			
 
				+def get_res_list_from_layout_res(layout_res):
			
 
				+    ocr_res_list = []
			
 
				+    table_res_list = []
			
 
				+    single_page_mfdetrec_res = []
			
 
				+    for res in layout_res:
			
 
				+        if int(res['category_id']) in [13, 14]:
			
 
				+            single_page_mfdetrec_res.append({
			
 
				+                "bbox": [int(res['poly'][0]), int(res['poly'][1]),
			
 
				+                         int(res['poly'][4]), int(res['poly'][5])],
			
 
				+            })
			
 
				+        elif int(res['category_id']) in [0, 1, 2, 4, 6, 7]:
			
 
				+            ocr_res_list.append(res)
			
 
				+        elif int(res['category_id']) in [5]:
			
 
				+            table_res_list.append(res)
			
 
				+    return ocr_res_list, table_res_list, single_page_mfdetrec_res
			
 
				+
			
 
				+
			
 
				+def clean_vram(device, vram_threshold=8):
			
 
				+    if torch.cuda.is_available() and device != 'cpu':
			
 
				+        total_memory = torch.cuda.get_device_properties(device).total_memory / (1024 ** 3)  # 将字节转换为 GB
			
 
				+        if total_memory <= vram_threshold:
			
 
				+            gc_start = time.time()
			
 
				+            clean_memory()
			
 
				+            gc_time = round(time.time() - gc_start, 2)
			
 
				+            logger.info(f"gc time: {gc_time}")
			
--- a/magic_pdf/model/sub_modules/ocr/__init__.py
+++ b/magic_pdf/model/sub_modules/ocr/__init__.py
--- a/magic_pdf/model/sub_modules/ocr/paddleocr/__init__.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr/__init__.py
--- a/magic_pdf/model/sub_modules/ocr/paddleocr/ocr_utils.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr/ocr_utils.py
@@ -0,0 +1,259 @@
 
				+import math
			
 
				+
			
 
				+import numpy as np
			
 
				+from loguru import logger
			
 
				+
			
 
				+from magic_pdf.libs.boxbase import __is_overlaps_y_exceeds_threshold
			
 
				+from magic_pdf.pre_proc.ocr_dict_merge import merge_spans_to_line
			
 
				+
			
 
				+
			
 
				+def bbox_to_points(bbox):
			
 
				+    """ 将bbox格式转换为四个顶点的数组 """
			
 
				+    x0, y0, x1, y1 = bbox
			
 
				+    return np.array([[x0, y0], [x1, y0], [x1, y1], [x0, y1]]).astype('float32')
			
 
				+
			
 
				+
			
 
				+def points_to_bbox(points):
			
 
				+    """ 将四个顶点的数组转换为bbox格式 """
			
 
				+    x0, y0 = points[0]
			
 
				+    x1, _ = points[1]
			
 
				+    _, y1 = points[2]
			
 
				+    return [x0, y0, x1, y1]
			
 
				+
			
 
				+
			
 
				+def merge_intervals(intervals):
			
 
				+    # Sort the intervals based on the start value
			
 
				+    intervals.sort(key=lambda x: x[0])
			
 
				+
			
 
				+    merged = []
			
 
				+    for interval in intervals:
			
 
				+        # If the list of merged intervals is empty or if the current
			
 
				+        # interval does not overlap with the previous, simply append it.
			
 
				+        if not merged or merged[-1][1] < interval[0]:
			
 
				+            merged.append(interval)
			
 
				+        else:
			
 
				+            # Otherwise, there is overlap, so we merge the current and previous intervals.
			
 
				+            merged[-1][1] = max(merged[-1][1], interval[1])
			
 
				+
			
 
				+    return merged
			
 
				+
			
 
				+
			
 
				+def remove_intervals(original, masks):
			
 
				+    # Merge all mask intervals
			
 
				+    merged_masks = merge_intervals(masks)
			
 
				+
			
 
				+    result = []
			
 
				+    original_start, original_end = original
			
 
				+
			
 
				+    for mask in merged_masks:
			
 
				+        mask_start, mask_end = mask
			
 
				+
			
 
				+        # If the mask starts after the original range, ignore it
			
 
				+        if mask_start > original_end:
			
 
				+            continue
			
 
				+
			
 
				+        # If the mask ends before the original range starts, ignore it
			
 
				+        if mask_end < original_start:
			
 
				+            continue
			
 
				+
			
 
				+        # Remove the masked part from the original range
			
 
				+        if original_start < mask_start:
			
 
				+            result.append([original_start, mask_start - 1])
			
 
				+
			
 
				+        original_start = max(mask_end + 1, original_start)
			
 
				+
			
 
				+    # Add the remaining part of the original range, if any
			
 
				+    if original_start <= original_end:
			
 
				+        result.append([original_start, original_end])
			
 
				+
			
 
				+    return result
			
 
				+
			
 
				+
			
 
				+def update_det_boxes(dt_boxes, mfd_res):
			
 
				+    new_dt_boxes = []
			
 
				+    for text_box in dt_boxes:
			
 
				+        text_bbox = points_to_bbox(text_box)
			
 
				+        masks_list = []
			
 
				+        for mf_box in mfd_res:
			
 
				+            mf_bbox = mf_box['bbox']
			
 
				+            if __is_overlaps_y_exceeds_threshold(text_bbox, mf_bbox):
			
 
				+                masks_list.append([mf_bbox[0], mf_bbox[2]])
			
 
				+        text_x_range = [text_bbox[0], text_bbox[2]]
			
 
				+        text_remove_mask_range = remove_intervals(text_x_range, masks_list)
			
 
				+        temp_dt_box = []
			
 
				+        for text_remove_mask in text_remove_mask_range:
			
 
				+            temp_dt_box.append(bbox_to_points([text_remove_mask[0], text_bbox[1], text_remove_mask[1], text_bbox[3]]))
			
 
				+        if len(temp_dt_box) > 0:
			
 
				+            new_dt_boxes.extend(temp_dt_box)
			
 
				+    return new_dt_boxes
			
 
				+
			
 
				+
			
 
				+def merge_overlapping_spans(spans):
			
 
				+    """
			
 
				+    Merges overlapping spans on the same line.
			
 
				+
			
 
				+    :param spans: A list of span coordinates [(x1, y1, x2, y2), ...]
			
 
				+    :return: A list of merged spans
			
 
				+    """
			
 
				+    # Return an empty list if the input spans list is empty
			
 
				+    if not spans:
			
 
				+        return []
			
 
				+
			
 
				+    # Sort spans by their starting x-coordinate
			
 
				+    spans.sort(key=lambda x: x[0])
			
 
				+
			
 
				+    # Initialize the list of merged spans
			
 
				+    merged = []
			
 
				+    for span in spans:
			
 
				+        # Unpack span coordinates
			
 
				+        x1, y1, x2, y2 = span
			
 
				+        # If the merged list is empty or there's no horizontal overlap, add the span directly
			
 
				+        if not merged or merged[-1][2] < x1:
			
 
				+            merged.append(span)
			
 
				+        else:
			
 
				+            # If there is horizontal overlap, merge the current span with the previous one
			
 
				+            last_span = merged.pop()
			
 
				+            # Update the merged span's top-left corner to the smaller (x1, y1) and bottom-right to the larger (x2, y2)
			
 
				+            x1 = min(last_span[0], x1)
			
 
				+            y1 = min(last_span[1], y1)
			
 
				+            x2 = max(last_span[2], x2)
			
 
				+            y2 = max(last_span[3], y2)
			
 
				+            # Add the merged span back to the list
			
 
				+            merged.append((x1, y1, x2, y2))
			
 
				+
			
 
				+    # Return the list of merged spans
			
 
				+    return merged
			
 
				+
			
 
				+
			
 
				+def merge_det_boxes(dt_boxes):
			
 
				+    """
			
 
				+    Merge detection boxes.
			
 
				+
			
 
				+    This function takes a list of detected bounding boxes, each represented by four corner points.
			
 
				+    The goal is to merge these bounding boxes into larger text regions.
			
 
				+
			
 
				+    Parameters:
			
 
				+    dt_boxes (list): A list containing multiple text detection boxes, where each box is defined by four corner points.
			
 
				+
			
 
				+    Returns:
			
 
				+    list: A list containing the merged text regions, where each region is represented by four corner points.
			
 
				+    """
			
 
				+    # Convert the detection boxes into a dictionary format with bounding boxes and type
			
 
				+    dt_boxes_dict_list = []
			
 
				+    angle_boxes_list = []
			
 
				+    for text_box in dt_boxes:
			
 
				+        text_bbox = points_to_bbox(text_box)
			
 
				+        if text_bbox[2] <= text_bbox[0] or text_bbox[3] <= text_bbox[1]:
			
 
				+            angle_boxes_list.append(text_box)
			
 
				+            continue
			
 
				+        text_box_dict = {
			
 
				+            'bbox': text_bbox,
			
 
				+            'type': 'text',
			
 
				+        }
			
 
				+        dt_boxes_dict_list.append(text_box_dict)
			
 
				+
			
 
				+    # Merge adjacent text regions into lines
			
 
				+    lines = merge_spans_to_line(dt_boxes_dict_list)
			
 
				+
			
 
				+    # Initialize a new list for storing the merged text regions
			
 
				+    new_dt_boxes = []
			
 
				+    for line in lines:
			
 
				+        line_bbox_list = []
			
 
				+        for span in line:
			
 
				+            line_bbox_list.append(span['bbox'])
			
 
				+
			
 
				+        # Merge overlapping text regions within the same line
			
 
				+        merged_spans = merge_overlapping_spans(line_bbox_list)
			
 
				+
			
 
				+        # Convert the merged text regions back to point format and add them to the new detection box list
			
 
				+        for span in merged_spans:
			
 
				+            new_dt_boxes.append(bbox_to_points(span))
			
 
				+
			
 
				+    new_dt_boxes.extend(angle_boxes_list)
			
 
				+
			
 
				+    return new_dt_boxes
			
 
				+
			
 
				+
			
 
				+def get_adjusted_mfdetrec_res(single_page_mfdetrec_res, useful_list):
			
 
				+    paste_x, paste_y, xmin, ymin, xmax, ymax, new_width, new_height = useful_list
			
 
				+    # Adjust the coordinates of the formula area
			
 
				+    adjusted_mfdetrec_res = []
			
 
				+    for mf_res in single_page_mfdetrec_res:
			
 
				+        mf_xmin, mf_ymin, mf_xmax, mf_ymax = mf_res["bbox"]
			
 
				+        # Adjust the coordinates of the formula area to the coordinates relative to the cropping area
			
 
				+        x0 = mf_xmin - xmin + paste_x
			
 
				+        y0 = mf_ymin - ymin + paste_y
			
 
				+        x1 = mf_xmax - xmin + paste_x
			
 
				+        y1 = mf_ymax - ymin + paste_y
			
 
				+        # Filter formula blocks outside the graph
			
 
				+        if any([x1 < 0, y1 < 0]) or any([x0 > new_width, y0 > new_height]):
			
 
				+            continue
			
 
				+        else:
			
 
				+            adjusted_mfdetrec_res.append({
			
 
				+                "bbox": [x0, y0, x1, y1],
			
 
				+            })
			
 
				+    return adjusted_mfdetrec_res
			
 
				+
			
 
				+
			
 
				+def get_ocr_result_list(ocr_res, useful_list):
			
 
				+    paste_x, paste_y, xmin, ymin, xmax, ymax, new_width, new_height = useful_list
			
 
				+    ocr_result_list = []
			
 
				+    for box_ocr_res in ocr_res:
			
 
				+
			
 
				+        p1, p2, p3, p4 = box_ocr_res[0]
			
 
				+        text, score = box_ocr_res[1]
			
 
				+        average_angle_degrees = calculate_angle_degrees(box_ocr_res[0])
			
 
				+        if average_angle_degrees > 0.5:
			
 
				+            # logger.info(f"average_angle_degrees: {average_angle_degrees}, text: {text}")
			
 
				+            # 与x轴的夹角超过0.5度，对边界做一下矫正
			
 
				+            # 计算几何中心
			
 
				+            x_center = sum(point[0] for point in box_ocr_res[0]) / 4
			
 
				+            y_center = sum(point[1] for point in box_ocr_res[0]) / 4
			
 
				+            new_height = ((p4[1] - p1[1]) + (p3[1] - p2[1])) / 2
			
 
				+            new_width = p3[0] - p1[0]
			
 
				+            p1 = [x_center - new_width / 2, y_center - new_height / 2]
			
 
				+            p2 = [x_center + new_width / 2, y_center - new_height / 2]
			
 
				+            p3 = [x_center + new_width / 2, y_center + new_height / 2]
			
 
				+            p4 = [x_center - new_width / 2, y_center + new_height / 2]
			
 
				+
			
 
				+        # Convert the coordinates back to the original coordinate system
			
 
				+        p1 = [p1[0] - paste_x + xmin, p1[1] - paste_y + ymin]
			
 
				+        p2 = [p2[0] - paste_x + xmin, p2[1] - paste_y + ymin]
			
 
				+        p3 = [p3[0] - paste_x + xmin, p3[1] - paste_y + ymin]
			
 
				+        p4 = [p4[0] - paste_x + xmin, p4[1] - paste_y + ymin]
			
 
				+
			
 
				+        ocr_result_list.append({
			
 
				+            'category_id': 15,
			
 
				+            'poly': p1 + p2 + p3 + p4,
			
 
				+            'score': float(round(score, 2)),
			
 
				+            'text': text,
			
 
				+        })
			
 
				+
			
 
				+    return ocr_result_list
			
 
				+
			
 
				+
			
 
				+def calculate_angle_degrees(poly):
			
 
				+    # 定义对角线的顶点
			
 
				+    diagonal1 = (poly[0], poly[2])
			
 
				+    diagonal2 = (poly[1], poly[3])
			
 
				+
			
 
				+    # 计算对角线的斜率
			
 
				+    def slope(p1, p2):
			
 
				+        return (p2[1] - p1[1]) / (p2[0] - p1[0]) if p2[0] != p1[0] else float('inf')
			
 
				+
			
 
				+    slope1 = slope(diagonal1[0], diagonal1[1])
			
 
				+    slope2 = slope(diagonal2[0], diagonal2[1])
			
 
				+
			
 
				+    # 计算对角线与x轴的夹角（以弧度为单位）
			
 
				+    angle1_radians = math.atan(slope1)
			
 
				+    angle2_radians = math.atan(slope2)
			
 
				+
			
 
				+    # 将弧度转换为角度
			
 
				+    angle1_degrees = math.degrees(angle1_radians)
			
 
				+    angle2_degrees = math.degrees(angle2_radians)
			
 
				+
			
 
				+    # 取两条对角线与x轴夹角的平均值
			
 
				+    average_angle_degrees = abs((angle1_degrees + angle2_degrees) / 2)
			
 
				+    # logger.info(f"average_angle_degrees: {average_angle_degrees}")
			
 
				+    return average_angle_degrees
			
 
				+
			
--- a/magic_pdf/model/sub_modules/ocr/paddleocr/ppocr_273_mod.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr/ppocr_273_mod.py
@@ -0,0 +1,168 @@
 
				+import copy
			
 
				+import time
			
 
				+
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from paddleocr import PaddleOCR
			
 
				+from paddleocr.paddleocr import check_img, logger
			
 
				+from paddleocr.ppocr.utils.utility import alpha_to_color, binarize_img
			
 
				+from paddleocr.tools.infer.predict_system import sorted_boxes
			
 
				+from paddleocr.tools.infer.utility import get_rotate_crop_image, get_minarea_rect_crop
			
 
				+
			
 
				+from magic_pdf.model.sub_modules.ocr.paddleocr.ocr_utils import update_det_boxes, merge_det_boxes
			
 
				+
			
 
				+
			
 
				+class ModifiedPaddleOCR(PaddleOCR):
			
 
				+    def ocr(self,
			
 
				+            img,
			
 
				+            det=True,
			
 
				+            rec=True,
			
 
				+            cls=True,
			
 
				+            bin=False,
			
 
				+            inv=False,
			
 
				+            alpha_color=(255, 255, 255),
			
 
				+            mfd_res=None,
			
 
				+            ):
			
 
				+        """
			
 
				+        OCR with PaddleOCR
			
 
				+        args：
			
 
				+            img: img for OCR, support ndarray, img_path and list or ndarray
			
 
				+            det: use text detection or not. If False, only rec will be exec. Default is True
			
 
				+            rec: use text recognition or not. If False, only det will be exec. Default is True
			
 
				+            cls: use angle classifier or not. Default is True. If True, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.
			
 
				+            bin: binarize image to black and white. Default is False.
			
 
				+            inv: invert image colors. Default is False.
			
 
				+            alpha_color: set RGB color Tuple for transparent parts replacement. Default is pure white.
			
 
				+        """
			
 
				+        assert isinstance(img, (np.ndarray, list, str, bytes))
			
 
				+        if isinstance(img, list) and det == True:
			
 
				+            logger.error('When input a list of images, det must be false')
			
 
				+            exit(0)
			
 
				+        if cls == True and self.use_angle_cls == False:
			
 
				+            pass
			
 
				+            # logger.warning(
			
 
				+            #     'Since the angle classifier is not initialized, it will not be used during the forward process'
			
 
				+            # )
			
 
				+
			
 
				+        img = check_img(img)
			
 
				+        # for infer pdf file
			
 
				+        if isinstance(img, list):
			
 
				+            if self.page_num > len(img) or self.page_num == 0:
			
 
				+                self.page_num = len(img)
			
 
				+            imgs = img[:self.page_num]
			
 
				+        else:
			
 
				+            imgs = [img]
			
 
				+
			
 
				+        def preprocess_image(_image):
			
 
				+            _image = alpha_to_color(_image, alpha_color)
			
 
				+            if inv:
			
 
				+                _image = cv2.bitwise_not(_image)
			
 
				+            if bin:
			
 
				+                _image = binarize_img(_image)
			
 
				+            return _image
			
 
				+
			
 
				+        if det and rec:
			
 
				+            ocr_res = []
			
 
				+            for idx, img in enumerate(imgs):
			
 
				+                img = preprocess_image(img)
			
 
				+                dt_boxes, rec_res, _ = self.__call__(img, cls, mfd_res=mfd_res)
			
 
				+                if not dt_boxes and not rec_res:
			
 
				+                    ocr_res.append(None)
			
 
				+                    continue
			
 
				+                tmp_res = [[box.tolist(), res]
			
 
				+                           for box, res in zip(dt_boxes, rec_res)]
			
 
				+                ocr_res.append(tmp_res)
			
 
				+            return ocr_res
			
 
				+        elif det and not rec:
			
 
				+            ocr_res = []
			
 
				+            for idx, img in enumerate(imgs):
			
 
				+                img = preprocess_image(img)
			
 
				+                dt_boxes, elapse = self.text_detector(img)
			
 
				+                if not dt_boxes:
			
 
				+                    ocr_res.append(None)
			
 
				+                    continue
			
 
				+                tmp_res = [box.tolist() for box in dt_boxes]
			
 
				+                ocr_res.append(tmp_res)
			
 
				+            return ocr_res
			
 
				+        else:
			
 
				+            ocr_res = []
			
 
				+            cls_res = []
			
 
				+            for idx, img in enumerate(imgs):
			
 
				+                if not isinstance(img, list):
			
 
				+                    img = preprocess_image(img)
			
 
				+                    img = [img]
			
 
				+                if self.use_angle_cls and cls:
			
 
				+                    img, cls_res_tmp, elapse = self.text_classifier(img)
			
 
				+                    if not rec:
			
 
				+                        cls_res.append(cls_res_tmp)
			
 
				+                rec_res, elapse = self.text_recognizer(img)
			
 
				+                ocr_res.append(rec_res)
			
 
				+            if not rec:
			
 
				+                return cls_res
			
 
				+            return ocr_res
			
 
				+
			
 
				+    def __call__(self, img, cls=True, mfd_res=None):
			
 
				+        time_dict = {'det': 0, 'rec': 0, 'cls': 0, 'all': 0}
			
 
				+
			
 
				+        if img is None:
			
 
				+            logger.debug("no valid image provided")
			
 
				+            return None, None, time_dict
			
 
				+
			
 
				+        start = time.time()
			
 
				+        ori_im = img.copy()
			
 
				+        dt_boxes, elapse = self.text_detector(img)
			
 
				+        time_dict['det'] = elapse
			
 
				+
			
 
				+        if dt_boxes is None:
			
 
				+            logger.debug("no dt_boxes found, elapsed : {}".format(elapse))
			
 
				+            end = time.time()
			
 
				+            time_dict['all'] = end - start
			
 
				+            return None, None, time_dict
			
 
				+        else:
			
 
				+            logger.debug("dt_boxes num : {}, elapsed : {}".format(
			
 
				+                len(dt_boxes), elapse))
			
 
				+        img_crop_list = []
			
 
				+
			
 
				+        dt_boxes = sorted_boxes(dt_boxes)
			
 
				+
			
 
				+        # @todo 目前是在bbox层merge，对倾斜文本行的兼容性不佳，需要修改成支持poly的merge
			
 
				+        # dt_boxes = merge_det_boxes(dt_boxes)
			
 
				+
			
 
				+
			
 
				+        if mfd_res:
			
 
				+            bef = time.time()
			
 
				+            dt_boxes = update_det_boxes(dt_boxes, mfd_res)
			
 
				+            aft = time.time()
			
 
				+            logger.debug("split text box by formula, new dt_boxes num : {}, elapsed : {}".format(
			
 
				+                len(dt_boxes), aft - bef))
			
 
				+
			
 
				+        for bno in range(len(dt_boxes)):
			
 
				+            tmp_box = copy.deepcopy(dt_boxes[bno])
			
 
				+            if self.args.det_box_type == "quad":
			
 
				+                img_crop = get_rotate_crop_image(ori_im, tmp_box)
			
 
				+            else:
			
 
				+                img_crop = get_minarea_rect_crop(ori_im, tmp_box)
			
 
				+            img_crop_list.append(img_crop)
			
 
				+        if self.use_angle_cls and cls:
			
 
				+            img_crop_list, angle_list, elapse = self.text_classifier(
			
 
				+                img_crop_list)
			
 
				+            time_dict['cls'] = elapse
			
 
				+            logger.debug("cls num  : {}, elapsed : {}".format(
			
 
				+                len(img_crop_list), elapse))
			
 
				+
			
 
				+        rec_res, elapse = self.text_recognizer(img_crop_list)
			
 
				+        time_dict['rec'] = elapse
			
 
				+        logger.debug("rec_res num  : {}, elapsed : {}".format(
			
 
				+            len(rec_res), elapse))
			
 
				+        if self.args.save_crop_res:
			
 
				+            self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list,
			
 
				+                                   rec_res)
			
 
				+        filter_boxes, filter_rec_res = [], []
			
 
				+        for box, rec_result in zip(dt_boxes, rec_res):
			
 
				+            text, score = rec_result
			
 
				+            if score >= self.drop_score:
			
 
				+                filter_boxes.append(box)
			
 
				+                filter_rec_res.append(rec_result)
			
 
				+        end = time.time()
			
 
				+        time_dict['all'] = end - start
			
 
				+        return filter_boxes, filter_rec_res, time_dict
			
--- a/magic_pdf/model/sub_modules/ocr/paddleocr/ppocr_291_mod.py
+++ b/magic_pdf/model/sub_modules/ocr/paddleocr/ppocr_291_mod.py
@@ -0,0 +1,213 @@
 
				+import copy
			
 
				+import time
			
 
				+
			
 
				+
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+from paddleocr import PaddleOCR
			
 
				+from paddleocr.paddleocr import check_img, logger
			
 
				+from paddleocr.ppocr.utils.utility import alpha_to_color, binarize_img
			
 
				+from paddleocr.tools.infer.predict_system import sorted_boxes
			
 
				+from paddleocr.tools.infer.utility import slice_generator, merge_fragmented, get_rotate_crop_image, \
			
 
				+    get_minarea_rect_crop
			
 
				+
			
 
				+from magic_pdf.model.sub_modules.ocr.paddleocr.ocr_utils import update_det_boxes
			
 
				+
			
 
				+
			
 
				+class ModifiedPaddleOCR(PaddleOCR):
			
 
				+
			
 
				+    def ocr(
			
 
				+        self,
			
 
				+        img,
			
 
				+        det=True,
			
 
				+        rec=True,
			
 
				+        cls=True,
			
 
				+        bin=False,
			
 
				+        inv=False,
			
 
				+        alpha_color=(255, 255, 255),
			
 
				+        slice={},
			
 
				+        mfd_res=None,
			
 
				+    ):
			
 
				+        """
			
 
				+        OCR with PaddleOCR
			
 
				+
			
 
				+        Args:
			
 
				+            img: Image for OCR. It can be an ndarray, img_path, or a list of ndarrays.
			
 
				+            det: Use text detection or not. If False, only text recognition will be executed. Default is True.
			
 
				+            rec: Use text recognition or not. If False, only text detection will be executed. Default is True.
			
 
				+            cls: Use angle classifier or not. Default is True. If True, the text with a rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance.
			
 
				+            bin: Binarize image to black and white. Default is False.
			
 
				+            inv: Invert image colors. Default is False.
			
 
				+            alpha_color: Set RGB color Tuple for transparent parts replacement. Default is pure white.
			
 
				+            slice: Use sliding window inference for large images. Both det and rec must be True. Requires int values for slice["horizontal_stride"], slice["vertical_stride"], slice["merge_x_thres"], slice["merge_y_thres"] (See doc/doc_en/slice_en.md). Default is {}.
			
 
				+
			
 
				+        Returns:
			
 
				+            If both det and rec are True, returns a list of OCR results for each image. Each OCR result is a list of bounding boxes and recognized text for each detected text region.
			
 
				+            If det is True and rec is False, returns a list of detected bounding boxes for each image.
			
 
				+            If det is False and rec is True, returns a list of recognized text for each image.
			
 
				+            If both det and rec are False, returns a list of angle classification results for each image.
			
 
				+
			
 
				+        Raises:
			
 
				+            AssertionError: If the input image is not of type ndarray, list, str, or bytes.
			
 
				+            SystemExit: If det is True and the input is a list of images.
			
 
				+
			
 
				+        Note:
			
 
				+            - If the angle classifier is not initialized (use_angle_cls=False), it will not be used during the forward process.
			
 
				+            - For PDF files, if the input is a list of images and the page_num is specified, only the first page_num images will be processed.
			
 
				+            - The preprocess_image function is used to preprocess the input image by applying alpha color replacement, inversion, and binarization if specified.
			
 
				+        """
			
 
				+        assert isinstance(img, (np.ndarray, list, str, bytes))
			
 
				+        if isinstance(img, list) and det == True:
			
 
				+            logger.error("When input a list of images, det must be false")
			
 
				+            exit(0)
			
 
				+        if cls == True and self.use_angle_cls == False:
			
 
				+            logger.warning(
			
 
				+                "Since the angle classifier is not initialized, it will not be used during the forward process"
			
 
				+            )
			
 
				+
			
 
				+        img, flag_gif, flag_pdf = check_img(img, alpha_color)
			
 
				+        # for infer pdf file
			
 
				+        if isinstance(img, list) and flag_pdf:
			
 
				+            if self.page_num > len(img) or self.page_num == 0:
			
 
				+                imgs = img
			
 
				+            else:
			
 
				+                imgs = img[: self.page_num]
			
 
				+        else:
			
 
				+            imgs = [img]
			
 
				+
			
 
				+        def preprocess_image(_image):
			
 
				+            _image = alpha_to_color(_image, alpha_color)
			
 
				+            if inv:
			
 
				+                _image = cv2.bitwise_not(_image)
			
 
				+            if bin:
			
 
				+                _image = binarize_img(_image)
			
 
				+            return _image
			
 
				+
			
 
				+        if det and rec:
			
 
				+            ocr_res = []
			
 
				+            for img in imgs:
			
 
				+                img = preprocess_image(img)
			
 
				+                dt_boxes, rec_res, _ = self.__call__(img, cls, slice, mfd_res=mfd_res)
			
 
				+                if not dt_boxes and not rec_res:
			
 
				+                    ocr_res.append(None)
			
 
				+                    continue
			
 
				+                tmp_res = [[box.tolist(), res] for box, res in zip(dt_boxes, rec_res)]
			
 
				+                ocr_res.append(tmp_res)
			
 
				+            return ocr_res
			
 
				+        elif det and not rec:
			
 
				+            ocr_res = []
			
 
				+            for img in imgs:
			
 
				+                img = preprocess_image(img)
			
 
				+                dt_boxes, elapse = self.text_detector(img)
			
 
				+                if dt_boxes.size == 0:
			
 
				+                    ocr_res.append(None)
			
 
				+                    continue
			
 
				+                tmp_res = [box.tolist() for box in dt_boxes]
			
 
				+                ocr_res.append(tmp_res)
			
 
				+            return ocr_res
			
 
				+        else:
			
 
				+            ocr_res = []
			
 
				+            cls_res = []
			
 
				+            for img in imgs:
			
 
				+                if not isinstance(img, list):
			
 
				+                    img = preprocess_image(img)
			
 
				+                    img = [img]
			
 
				+                if self.use_angle_cls and cls:
			
 
				+                    img, cls_res_tmp, elapse = self.text_classifier(img)
			
 
				+                    if not rec:
			
 
				+                        cls_res.append(cls_res_tmp)
			
 
				+                rec_res, elapse = self.text_recognizer(img)
			
 
				+                ocr_res.append(rec_res)
			
 
				+            if not rec:
			
 
				+                return cls_res
			
 
				+            return ocr_res
			
 
				+
			
 
				+    def __call__(self, img, cls=True, slice={}, mfd_res=None):
			
 
				+        time_dict = {"det": 0, "rec": 0, "cls": 0, "all": 0}
			
 
				+
			
 
				+        if img is None:
			
 
				+            logger.debug("no valid image provided")
			
 
				+            return None, None, time_dict
			
 
				+
			
 
				+        start = time.time()
			
 
				+        ori_im = img.copy()
			
 
				+        if slice:
			
 
				+            slice_gen = slice_generator(
			
 
				+                img,
			
 
				+                horizontal_stride=slice["horizontal_stride"],
			
 
				+                vertical_stride=slice["vertical_stride"],
			
 
				+            )
			
 
				+            elapsed = []
			
 
				+            dt_slice_boxes = []
			
 
				+            for slice_crop, v_start, h_start in slice_gen:
			
 
				+                dt_boxes, elapse = self.text_detector(slice_crop, use_slice=True)
			
 
				+                if dt_boxes.size:
			
 
				+                    dt_boxes[:, :, 0] += h_start
			
 
				+                    dt_boxes[:, :, 1] += v_start
			
 
				+                    dt_slice_boxes.append(dt_boxes)
			
 
				+                    elapsed.append(elapse)
			
 
				+            dt_boxes = np.concatenate(dt_slice_boxes)
			
 
				+
			
 
				+            dt_boxes = merge_fragmented(
			
 
				+                boxes=dt_boxes,
			
 
				+                x_threshold=slice["merge_x_thres"],
			
 
				+                y_threshold=slice["merge_y_thres"],
			
 
				+            )
			
 
				+            elapse = sum(elapsed)
			
 
				+        else:
			
 
				+            dt_boxes, elapse = self.text_detector(img)
			
 
				+
			
 
				+        time_dict["det"] = elapse
			
 
				+
			
 
				+        if dt_boxes is None:
			
 
				+            logger.debug("no dt_boxes found, elapsed : {}".format(elapse))
			
 
				+            end = time.time()
			
 
				+            time_dict["all"] = end - start
			
 
				+            return None, None, time_dict
			
 
				+        else:
			
 
				+            logger.debug(
			
 
				+                "dt_boxes num : {}, elapsed : {}".format(len(dt_boxes), elapse)
			
 
				+            )
			
 
				+        img_crop_list = []
			
 
				+
			
 
				+        dt_boxes = sorted_boxes(dt_boxes)
			
 
				+
			
 
				+        if mfd_res:
			
 
				+            bef = time.time()
			
 
				+            dt_boxes = update_det_boxes(dt_boxes, mfd_res)
			
 
				+            aft = time.time()
			
 
				+            logger.debug("split text box by formula, new dt_boxes num : {}, elapsed : {}".format(
			
 
				+                len(dt_boxes), aft - bef))
			
 
				+
			
 
				+        for bno in range(len(dt_boxes)):
			
 
				+            tmp_box = copy.deepcopy(dt_boxes[bno])
			
 
				+            if self.args.det_box_type == "quad":
			
 
				+                img_crop = get_rotate_crop_image(ori_im, tmp_box)
			
 
				+            else:
			
 
				+                img_crop = get_minarea_rect_crop(ori_im, tmp_box)
			
 
				+            img_crop_list.append(img_crop)
			
 
				+        if self.use_angle_cls and cls:
			
 
				+            img_crop_list, angle_list, elapse = self.text_classifier(img_crop_list)
			
 
				+            time_dict["cls"] = elapse
			
 
				+            logger.debug(
			
 
				+                "cls num  : {}, elapsed : {}".format(len(img_crop_list), elapse)
			
 
				+            )
			
 
				+        if len(img_crop_list) > 1000:
			
 
				+            logger.debug(
			
 
				+                f"rec crops num: {len(img_crop_list)}, time and memory cost may be large."
			
 
				+            )
			
 
				+
			
 
				+        rec_res, elapse = self.text_recognizer(img_crop_list)
			
 
				+        time_dict["rec"] = elapse
			
 
				+        logger.debug("rec_res num  : {}, elapsed : {}".format(len(rec_res), elapse))
			
 
				+        if self.args.save_crop_res:
			
 
				+            self.draw_crop_rec_res(self.args.crop_res_save_dir, img_crop_list, rec_res)
			
 
				+        filter_boxes, filter_rec_res = [], []
			
 
				+        for box, rec_result in zip(dt_boxes, rec_res):
			
 
				+            text, score = rec_result[0], rec_result[1]
			
 
				+            if score >= self.drop_score:
			
 
				+                filter_boxes.append(box)
			
 
				+                filter_rec_res.append(rec_result)
			
 
				+        end = time.time()
			
 
				+        time_dict["all"] = end - start
			
 
				+        return filter_boxes, filter_rec_res, time_dict
			
--- a/magic_pdf/model/sub_modules/reading_oreder/__init__.py
+++ b/magic_pdf/model/sub_modules/reading_oreder/__init__.py
--- a/magic_pdf/model/sub_modules/reading_oreder/layoutreader/__init__.py
+++ b/magic_pdf/model/sub_modules/reading_oreder/layoutreader/__init__.py
--- a/magic_pdf/model/sub_modules/reading_oreder/layoutreader/helpers.py
+++ b/magic_pdf/model/sub_modules/reading_oreder/layoutreader/helpers.py
--- a/magic_pdf/model/sub_modules/reading_oreder/layoutreader/xycut.py
+++ b/magic_pdf/model/sub_modules/reading_oreder/layoutreader/xycut.py
@@ -0,0 +1,242 @@
 
				+from typing import List
			
 
				+import cv2
			
 
				+import numpy as np
			
 
				+
			
 
				+
			
 
				+def projection_by_bboxes(boxes: np.array, axis: int) -> np.ndarray:
			
 
				+    """
			
 
				+     通过一组 bbox 获得投影直方图，最后以 per-pixel 形式输出
			
 
				+
			
 
				+    Args:
			
 
				+        boxes: [N, 4]
			
 
				+        axis: 0-x坐标向水平方向投影， 1-y坐标向垂直方向投影
			
 
				+
			
 
				+    Returns:
			
 
				+        1D 投影直方图，长度为投影方向坐标的最大值(我们不需要图片的实际边长，因为只是要找文本框的间隔)
			
 
				+
			
 
				+    """
			
 
				+    assert axis in [0, 1]
			
 
				+    length = np.max(boxes[:, axis::2])
			
 
				+    res = np.zeros(length, dtype=int)
			
 
				+    # TODO: how to remove for loop?
			
 
				+    for start, end in boxes[:, axis::2]:
			
 
				+        res[start:end] += 1
			
 
				+    return res
			
 
				+
			
 
				+
			
 
				+# from: https://dothinking.github.io/2021-06-19-%E9%80%92%E5%BD%92%E6%8A%95%E5%BD%B1%E5%88%86%E5%89%B2%E7%AE%97%E6%B3%95/#:~:text=%E9%80%92%E5%BD%92%E6%8A%95%E5%BD%B1%E5%88%86%E5%89%B2%EF%BC%88Recursive%20XY,%EF%BC%8C%E5%8F%AF%E4%BB%A5%E5%88%92%E5%88%86%E6%AE%B5%E8%90%BD%E3%80%81%E8%A1%8C%E3%80%82
			
 
				+def split_projection_profile(arr_values: np.array, min_value: float, min_gap: float):
			
 
				+    """Split projection profile:
			
 
				+
			
 
				+    ```
			
 
				+                              ┌──┐
			
 
				+         arr_values           │  │       ┌─┐───
			
 
				+             ┌──┐             │  │       │ │ |
			
 
				+             │  │             │  │ ┌───┐ │ │min_value
			
 
				+             │  │<- min_gap ->│  │ │   │ │ │ |
			
 
				+         ────┴──┴─────────────┴──┴─┴───┴─┴─┴─┴───
			
 
				+         0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
			
 
				+    ```
			
 
				+
			
 
				+    Args:
			
 
				+        arr_values (np.array): 1-d array representing the projection profile.
			
 
				+        min_value (float): Ignore the profile if `arr_value` is less than `min_value`.
			
 
				+        min_gap (float): Ignore the gap if less than this value.
			
 
				+
			
 
				+    Returns:
			
 
				+        tuple: Start indexes and end indexes of split groups.
			
 
				+    """
			
 
				+    # all indexes with projection height exceeding the threshold
			
 
				+    arr_index = np.where(arr_values > min_value)[0]
			
 
				+    if not len(arr_index):
			
 
				+        return
			
 
				+
			
 
				+    # find zero intervals between adjacent projections
			
 
				+    # |  |                    ||
			
 
				+    # ||||<- zero-interval -> |||||
			
 
				+    arr_diff = arr_index[1:] - arr_index[0:-1]
			
 
				+    arr_diff_index = np.where(arr_diff > min_gap)[0]
			
 
				+    arr_zero_intvl_start = arr_index[arr_diff_index]
			
 
				+    arr_zero_intvl_end = arr_index[arr_diff_index + 1]
			
 
				+
			
 
				+    # convert to index of projection range:
			
 
				+    # the start index of zero interval is the end index of projection
			
 
				+    arr_start = np.insert(arr_zero_intvl_end, 0, arr_index[0])
			
 
				+    arr_end = np.append(arr_zero_intvl_start, arr_index[-1])
			
 
				+    arr_end += 1  # end index will be excluded as index slice
			
 
				+
			
 
				+    return arr_start, arr_end
			
 
				+
			
 
				+
			
 
				+def recursive_xy_cut(boxes: np.ndarray, indices: List[int], res: List[int]):
			
 
				+    """
			
 
				+
			
 
				+    Args:
			
 
				+        boxes: (N, 4)
			
 
				+        indices: 递归过程中始终表示 box 在原始数据中的索引
			
 
				+        res: 保存输出结果
			
 
				+
			
 
				+    """
			
 
				+    # 向 y 轴投影
			
 
				+    assert len(boxes) == len(indices)
			
 
				+
			
 
				+    _indices = boxes[:, 1].argsort()
			
 
				+    y_sorted_boxes = boxes[_indices]
			
 
				+    y_sorted_indices = indices[_indices]
			
 
				+
			
 
				+    # debug_vis(y_sorted_boxes, y_sorted_indices)
			
 
				+
			
 
				+    y_projection = projection_by_bboxes(boxes=y_sorted_boxes, axis=1)
			
 
				+    pos_y = split_projection_profile(y_projection, 0, 1)
			
 
				+    if not pos_y:
			
 
				+        return
			
 
				+
			
 
				+    arr_y0, arr_y1 = pos_y
			
 
				+    for r0, r1 in zip(arr_y0, arr_y1):
			
 
				+        # [r0, r1] 表示按照水平切分，有 bbox 的区域，对这些区域会再进行垂直切分
			
 
				+        _indices = (r0 <= y_sorted_boxes[:, 1]) & (y_sorted_boxes[:, 1] < r1)
			
 
				+
			
 
				+        y_sorted_boxes_chunk = y_sorted_boxes[_indices]
			
 
				+        y_sorted_indices_chunk = y_sorted_indices[_indices]
			
 
				+
			
 
				+        _indices = y_sorted_boxes_chunk[:, 0].argsort()
			
 
				+        x_sorted_boxes_chunk = y_sorted_boxes_chunk[_indices]
			
 
				+        x_sorted_indices_chunk = y_sorted_indices_chunk[_indices]
			
 
				+
			
 
				+        # 往 x 方向投影
			
 
				+        x_projection = projection_by_bboxes(boxes=x_sorted_boxes_chunk, axis=0)
			
 
				+        pos_x = split_projection_profile(x_projection, 0, 1)
			
 
				+        if not pos_x:
			
 
				+            continue
			
 
				+
			
 
				+        arr_x0, arr_x1 = pos_x
			
 
				+        if len(arr_x0) == 1:
			
 
				+            # x 方向无法切分
			
 
				+            res.extend(x_sorted_indices_chunk)
			
 
				+            continue
			
 
				+
			
 
				+        # x 方向上能分开，继续递归调用
			
 
				+        for c0, c1 in zip(arr_x0, arr_x1):
			
 
				+            _indices = (c0 <= x_sorted_boxes_chunk[:, 0]) & (
			
 
				+                x_sorted_boxes_chunk[:, 0] < c1
			
 
				+            )
			
 
				+            recursive_xy_cut(
			
 
				+                x_sorted_boxes_chunk[_indices], x_sorted_indices_chunk[_indices], res
			
 
				+            )
			
 
				+
			
 
				+
			
 
				+def points_to_bbox(points):
			
 
				+    assert len(points) == 8
			
 
				+
			
 
				+    # [x1,y1,x2,y2,x3,y3,x4,y4]
			
 
				+    left = min(points[::2])
			
 
				+    right = max(points[::2])
			
 
				+    top = min(points[1::2])
			
 
				+    bottom = max(points[1::2])
			
 
				+
			
 
				+    left = max(left, 0)
			
 
				+    top = max(top, 0)
			
 
				+    right = max(right, 0)
			
 
				+    bottom = max(bottom, 0)
			
 
				+    return [left, top, right, bottom]
			
 
				+
			
 
				+
			
 
				+def bbox2points(bbox):
			
 
				+    left, top, right, bottom = bbox
			
 
				+    return [left, top, right, top, right, bottom, left, bottom]
			
 
				+
			
 
				+
			
 
				+def vis_polygon(img, points, thickness=2, color=None):
			
 
				+    br2bl_color = color
			
 
				+    tl2tr_color = color
			
 
				+    tr2br_color = color
			
 
				+    bl2tl_color = color
			
 
				+    cv2.line(
			
 
				+        img,
			
 
				+        (points[0][0], points[0][1]),
			
 
				+        (points[1][0], points[1][1]),
			
 
				+        color=tl2tr_color,
			
 
				+        thickness=thickness,
			
 
				+    )
			
 
				+
			
 
				+    cv2.line(
			
 
				+        img,
			
 
				+        (points[1][0], points[1][1]),
			
 
				+        (points[2][0], points[2][1]),
			
 
				+        color=tr2br_color,
			
 
				+        thickness=thickness,
			
 
				+    )
			
 
				+
			
 
				+    cv2.line(
			
 
				+        img,
			
 
				+        (points[2][0], points[2][1]),
			
 
				+        (points[3][0], points[3][1]),
			
 
				+        color=br2bl_color,
			
 
				+        thickness=thickness,
			
 
				+    )
			
 
				+
			
 
				+    cv2.line(
			
 
				+        img,
			
 
				+        (points[3][0], points[3][1]),
			
 
				+        (points[0][0], points[0][1]),
			
 
				+        color=bl2tl_color,
			
 
				+        thickness=thickness,
			
 
				+    )
			
 
				+    return img
			
 
				+
			
 
				+
			
 
				+def vis_points(
			
 
				+    img: np.ndarray, points, texts: List[str] = None, color=(0, 200, 0)
			
 
				+) -> np.ndarray:
			
 
				+    """
			
 
				+
			
 
				+    Args:
			
 
				+        img:
			
 
				+        points: [N, 8]  8: x1,y1,x2,y2,x3,y3,x3,y4
			
 
				+        texts:
			
 
				+        color:
			
 
				+
			
 
				+    Returns:
			
 
				+
			
 
				+    """
			
 
				+    points = np.array(points)
			
 
				+    if texts is not None:
			
 
				+        assert len(texts) == points.shape[0]
			
 
				+
			
 
				+    for i, _points in enumerate(points):
			
 
				+        vis_polygon(img, _points.reshape(-1, 2), thickness=2, color=color)
			
 
				+        bbox = points_to_bbox(_points)
			
 
				+        left, top, right, bottom = bbox
			
 
				+        cx = (left + right) // 2
			
 
				+        cy = (top + bottom) // 2
			
 
				+
			
 
				+        txt = texts[i]
			
 
				+        font = cv2.FONT_HERSHEY_SIMPLEX
			
 
				+        cat_size = cv2.getTextSize(txt, font, 0.5, 2)[0]
			
 
				+
			
 
				+        img = cv2.rectangle(
			
 
				+            img,
			
 
				+            (cx - 5 * len(txt), cy - cat_size[1] - 5),
			
 
				+            (cx - 5 * len(txt) + cat_size[0], cy - 5),
			
 
				+            color,
			
 
				+            -1,
			
 
				+        )
			
 
				+
			
 
				+        img = cv2.putText(
			
 
				+            img,
			
 
				+            txt,
			
 
				+            (cx - 5 * len(txt), cy - 5),
			
 
				+            font,
			
 
				+            0.5,
			
 
				+            (255, 255, 255),
			
 
				+            thickness=1,
			
 
				+            lineType=cv2.LINE_AA,
			
 
				+        )
			
 
				+
			
 
				+    return img
			
 
				+
			
 
				+
			
 
				+def vis_polygons_with_index(image, points):
			
 
				+    texts = [str(i) for i in range(len(points))]
			
 
				+    res_img = vis_points(image.copy(), points, texts)
			
 
				+    return res_img
			
--- a/magic_pdf/model/sub_modules/table/__init__.py
+++ b/magic_pdf/model/sub_modules/table/__init__.py
--- a/magic_pdf/model/sub_modules/table/rapidtable/__init__.py
+++ b/magic_pdf/model/sub_modules/table/rapidtable/__init__.py
--- a/magic_pdf/model/sub_modules/table/rapidtable/rapid_table.py
+++ b/magic_pdf/model/sub_modules/table/rapidtable/rapid_table.py
@@ -0,0 +1,14 @@
 
				+import numpy as np
			
 
				+from rapid_table import RapidTable
			
 
				+from rapidocr_paddle import RapidOCR
			
 
				+
			
 
				+
			
 
				+class RapidTableModel(object):
			
 
				+    def __init__(self):
			
 
				+        self.table_model = RapidTable()
			
 
				+        self.ocr_engine = RapidOCR(det_use_cuda=True, cls_use_cuda=True, rec_use_cuda=True)
			
 
				+
			
 
				+    def predict(self, image):
			
 
				+        ocr_result, _ = self.ocr_engine(np.asarray(image))
			
 
				+        html_code, table_cell_bboxes, elapse = self.table_model(np.asarray(image), ocr_result)
			
 
				+        return html_code, table_cell_bboxes, elapse
			
--- a/magic_pdf/model/sub_modules/table/structeqtable/__init__.py
+++ b/magic_pdf/model/sub_modules/table/structeqtable/__init__.py
--- a/magic_pdf/model/sub_modules/table/structeqtable/struct_eqtable.py
+++ b/magic_pdf/model/sub_modules/table/structeqtable/struct_eqtable.py
@@ -1,8 +1,8 @@
 
				-import re
			
 
				-
			
 
				 import torch
			
 
				 from struct_eqtable import build_model
			
 
				 
			
 
				+from magic_pdf.model.sub_modules.table.table_utils import minify_html
			
 
				+
			
 
				 
			
 
				 class StructTableModel:
			
 
				     def __init__(self, model_path, max_new_tokens=1024, max_time=60):
			
@@ -31,15 +31,7 @@ class StructTableModel:
 
				         )
			
 
				 
			
 
				         if output_format == "html":
			
 
				-            results = [self.minify_html(html) for html in results]
			
 
				+            results = [minify_html(html) for html in results]
			
 
				 
			
 
				         return results
			
 
				 
			
 
				-    def minify_html(self, html):
			
 
				-        # 移除多余的空白字符
			
 
				-        html = re.sub(r'\s+', ' ', html)
			
 
				-        # 移除行尾的空白字符
			
 
				-        html = re.sub(r'\s*>\s*', '>', html)
			
 
				-        # 移除标签前的空白字符
			
 
				-        html = re.sub(r'\s*<\s*', '<', html)
			
 
				-        return html.strip()
			
--- a/magic_pdf/model/sub_modules/table/table_utils.py
+++ b/magic_pdf/model/sub_modules/table/table_utils.py
@@ -0,0 +1,11 @@
 
				+import re
			
 
				+
			
 
				+
			
 
				+def minify_html(html):
			
 
				+    # 移除多余的空白字符
			
 
				+    html = re.sub(r'\s+', ' ', html)
			
 
				+    # 移除行尾的空白字符
			
 
				+    html = re.sub(r'\s*>\s*', '>', html)
			
 
				+    # 移除标签前的空白字符
			
 
				+    html = re.sub(r'\s*<\s*', '<', html)
			
 
				+    return html.strip()
			
--- a/magic_pdf/model/sub_modules/table/tablemaster/__init__.py
+++ b/magic_pdf/model/sub_modules/table/tablemaster/__init__.py
--- a/magic_pdf/model/sub_modules/table/tablemaster/tablemaster_paddle.py
+++ b/magic_pdf/model/sub_modules/table/tablemaster/tablemaster_paddle.py
@@ -7,7 +7,7 @@ from PIL import Image
 
				 import numpy as np
			
 
				 
			
 
				 
			
 
				-class ppTableModel(object):
			
 
				+class TableMasterPaddleModel(object):
			
 
				     """
			
 
				         This class is responsible for converting image of table into HTML format using a pre-trained model.
			
 
				 
			
--- a/magic_pdf/para/para_split_v3.py
+++ b/magic_pdf/para/para_split_v3.py
@@ -77,14 +77,12 @@ def __is_list_or_index_block(block):
 
				 
			
 
				         # 如果首行左边不顶格而右边顶格,末行左边顶格而右边不顶格 （第一行可能可以右边不顶格）
			
 
				         if (first_line['bbox'][0] - block['bbox_fs'][0] > line_height / 2 and
			
 
				-                # block['bbox_fs'][2] - first_line['bbox'][2] < line_height and
			
 
				                 abs(last_line['bbox'][0] - block['bbox_fs'][0]) < line_height / 2 and
			
 
				                 block['bbox_fs'][2] - last_line['bbox'][2] > line_height
			
 
				         ):
			
 
				             multiple_para_flag = True
			
 
				 
			
 
				         for line in block['lines']:
			
 
				-
			
 
				             line_mid_x = (line['bbox'][0] + line['bbox'][2]) / 2
			
 
				             block_mid_x = (block['bbox_fs'][0] + block['bbox_fs'][2]) / 2
			
 
				             if (
			
@@ -102,13 +100,13 @@ def __is_list_or_index_block(block):
 
				                 if span_type == ContentType.Text:
			
 
				                     line_text += span['content'].strip()
			
 
				 
			
 
				+            # 添加所有文本，包括空行，保持与block['lines']长度一致
			
 
				             lines_text_list.append(line_text)
			
 
				 
			
 
				             # 计算line左侧顶格数量是否大于2，是否顶格用abs(block['bbox_fs'][0] - line['bbox'][0]) < line_height/2 来判断
			
 
				             if abs(block['bbox_fs'][0] - line['bbox'][0]) < line_height / 2:
			
 
				                 left_close_num += 1
			
 
				             elif line['bbox'][0] - block['bbox_fs'][0] > line_height:
			
 
				-                # logger.info(f"{line_text}, {block['bbox_fs']}, {line['bbox']}")
			
 
				                 left_not_close_num += 1
			
 
				 
			
 
				             # 计算右侧是否顶格
			
@@ -117,7 +115,6 @@ def __is_list_or_index_block(block):
 
				             else:
			
 
				                 # 右侧不顶格情况下是否有一段距离，拍脑袋用0.3block宽度做阈值
			
 
				                 closed_area = 0.26 * block_weight
			
 
				-                # closed_area = 5 * line_height
			
 
				                 if block['bbox_fs'][2] - line['bbox'][2] > closed_area:
			
 
				                     right_not_close_num += 1
			
 
				 
			
@@ -128,6 +125,7 @@ def __is_list_or_index_block(block):
 
				         num_start_count = 0
			
 
				         num_end_count = 0
			
 
				         flag_end_count = 0
			
 
				+
			
 
				         if len(lines_text_list) > 0:
			
 
				             for line_text in lines_text_list:
			
 
				                 if len(line_text) > 0:
			
@@ -138,11 +136,10 @@ def __is_list_or_index_block(block):
 
				                     if line_text[-1].isdigit():
			
 
				                         num_end_count += 1
			
 
				 
			
 
				-            if flag_end_count / len(lines_text_list) >= 0.8:
			
 
				-                line_end_flag = True
			
 
				-
			
 
				             if num_start_count / len(lines_text_list) >= 0.8 or num_end_count / len(lines_text_list) >= 0.8:
			
 
				                 line_num_flag = True
			
 
				+            if flag_end_count / len(lines_text_list) >= 0.8:
			
 
				+                line_end_flag = True
			
 
				 
			
 
				         # 有的目录右侧不贴边, 目前认为左边或者右边有一边全贴边，且符合数字规则极为index
			
 
				         if ((left_close_num / len(block['lines']) >= 0.8 or right_close_num / len(block['lines']) >= 0.8)
			
@@ -176,7 +173,7 @@ def __is_list_or_index_block(block):
 
				                 # 这种是大部分line item 都有结束标识符的情况，按结束标识符区分不同item
			
 
				                 elif line_end_flag:
			
 
				                     for i, line in enumerate(block['lines']):
			
 
				-                        if lines_text_list[i][-1] in LIST_END_FLAG:
			
 
				+                        if len(lines_text_list[i]) > 0 and lines_text_list[i][-1] in LIST_END_FLAG:
			
 
				                             line[ListLineTag.IS_LIST_END_LINE] = True
			
 
				                             if i + 1 < len(block['lines']):
			
 
				                                 block['lines'][i + 1][ListLineTag.IS_LIST_START_LINE] = True
			
@@ -187,17 +184,18 @@ def __is_list_or_index_block(block):
 
				                         if line_start_flag:
			
 
				                             line[ListLineTag.IS_LIST_START_LINE] = True
			
 
				                             line_start_flag = False
			
 
				-                        # elif abs(block['bbox_fs'][2] - line['bbox'][2]) > line_height:
			
 
				+
			
 
				                         if abs(block['bbox_fs'][2] - line['bbox'][2]) > 0.1 * block_weight:
			
 
				                             line[ListLineTag.IS_LIST_END_LINE] = True
			
 
				                             line_start_flag = True
			
 
				-            # 一种有缩进的特殊有序list,start line 左侧不贴边且以数字开头，end line 以 IS_LIST_END_LINE 结尾且数量和start line 一致
			
 
				-            elif num_start_count >= 2 and num_start_count == flag_end_count:  # 简单一点先不考虑左侧不贴边的情况
			
 
				+            # 一种有缩进的特殊有序list,start line 左侧不贴边且以数字开头，end line 以 IS_LIST_END_FLAG 结尾且数量和start line 一致
			
 
				+            elif num_start_count >= 2 and num_start_count == flag_end_count:
			
 
				                 for i, line in enumerate(block['lines']):
			
 
				-                    if lines_text_list[i][0].isdigit():
			
 
				-                        line[ListLineTag.IS_LIST_START_LINE] = True
			
 
				-                    if lines_text_list[i][-1] in LIST_END_FLAG:
			
 
				-                        line[ListLineTag.IS_LIST_END_LINE] = True
			
 
				+                    if len(lines_text_list[i]) > 0:
			
 
				+                        if lines_text_list[i][0].isdigit():
			
 
				+                            line[ListLineTag.IS_LIST_START_LINE] = True
			
 
				+                        if lines_text_list[i][-1] in LIST_END_FLAG:
			
 
				+                            line[ListLineTag.IS_LIST_END_LINE] = True
			
 
				             else:
			
 
				                 # 正常有缩进的list处理
			
 
				                 for line in block['lines']:
			
--- a/magic_pdf/pdf_parse_union_core_v2.py
+++ b/magic_pdf/pdf_parse_union_core_v2.py
@@ -30,8 +30,8 @@ from magic_pdf.pre_proc.equations_replace import (
 
				 from magic_pdf.pre_proc.ocr_detect_all_bboxes import \
			
 
				     ocr_prepare_bboxes_for_layout_split_v2
			
 
				 from magic_pdf.pre_proc.ocr_dict_merge import (fill_spans_in_blocks,
			
 
				-                                               fix_block_spans,
			
 
				-                                               fix_discarded_block, fix_block_spans_v2)
			
 
				+                                               fix_discarded_block,
			
 
				+                                               fix_block_spans_v2)
			
 
				 from magic_pdf.pre_proc.ocr_span_list_modify import (
			
 
				     get_qa_need_list_v2, remove_overlaps_low_confidence_spans,
			
 
				     remove_overlaps_min_spans)
			
@@ -164,8 +164,8 @@ class ModelSingleton:
 
				 
			
 
				 
			
 
				 def do_predict(boxes: List[List[int]], model) -> List[int]:
			
 
				-    from magic_pdf.model.v3.helpers import (boxes2inputs, parse_logits,
			
 
				-                                            prepare_inputs)
			
 
				+    from magic_pdf.model.sub_modules.reading_oreder.layoutreader.helpers import (boxes2inputs, parse_logits,
			
 
				+                                                                                 prepare_inputs)
			
 
				 
			
 
				     inputs = boxes2inputs(boxes)
			
 
				     inputs = prepare_inputs(inputs, model)
			
@@ -174,23 +174,57 @@ def do_predict(boxes: List[List[int]], model) -> List[int]:
 
				 
			
 
				 
			
 
				 def cal_block_index(fix_blocks, sorted_bboxes):
			
 
				-    for block in fix_blocks:
			
 
				 
			
 
				-        line_index_list = []
			
 
				-        if len(block['lines']) == 0:
			
 
				-            block['index'] = sorted_bboxes.index(block['bbox'])
			
 
				-        else:
			
 
				+    if sorted_bboxes is not None:
			
 
				+        # 使用layoutreader排序
			
 
				+        for block in fix_blocks:
			
 
				+            line_index_list = []
			
 
				+            if len(block['lines']) == 0:
			
 
				+                block['index'] = sorted_bboxes.index(block['bbox'])
			
 
				+            else:
			
 
				+                for line in block['lines']:
			
 
				+                    line['index'] = sorted_bboxes.index(line['bbox'])
			
 
				+                    line_index_list.append(line['index'])
			
 
				+                median_value = statistics.median(line_index_list)
			
 
				+                block['index'] = median_value
			
 
				+
			
 
				+            # 删除图表body block中的虚拟line信息, 并用real_lines信息回填
			
 
				+            if block['type'] in [BlockType.ImageBody, BlockType.TableBody]:
			
 
				+                block['virtual_lines'] = copy.deepcopy(block['lines'])
			
 
				+                block['lines'] = copy.deepcopy(block['real_lines'])
			
 
				+                del block['real_lines']
			
 
				+    else:
			
 
				+        # 使用xycut排序
			
 
				+        block_bboxes = []
			
 
				+        for block in fix_blocks:
			
 
				+            block_bboxes.append(block['bbox'])
			
 
				+
			
 
				+            # 删除图表body block中的虚拟line信息, 并用real_lines信息回填
			
 
				+            if block['type'] in [BlockType.ImageBody, BlockType.TableBody]:
			
 
				+                block['virtual_lines'] = copy.deepcopy(block['lines'])
			
 
				+                block['lines'] = copy.deepcopy(block['real_lines'])
			
 
				+                del block['real_lines']
			
 
				+
			
 
				+        import numpy as np
			
 
				+        from magic_pdf.model.sub_modules.reading_oreder.layoutreader.xycut import recursive_xy_cut
			
 
				+
			
 
				+        random_boxes = np.array(block_bboxes)
			
 
				+        np.random.shuffle(random_boxes)
			
 
				+        res = []
			
 
				+        recursive_xy_cut(np.asarray(random_boxes).astype(int), np.arange(len(block_bboxes)), res)
			
 
				+        assert len(res) == len(block_bboxes)
			
 
				+        sorted_boxes = random_boxes[np.array(res)].tolist()
			
 
				+
			
 
				+        for i, block in enumerate(fix_blocks):
			
 
				+            block['index'] = sorted_boxes.index(block['bbox'])
			
 
				+
			
 
				+        # 生成line index
			
 
				+        sorted_blocks = sorted(fix_blocks, key=lambda b: b['index'])
			
 
				+        line_inedx = 1
			
 
				+        for block in sorted_blocks:
			
 
				             for line in block['lines']:
			
 
				-                line['index'] = sorted_bboxes.index(line['bbox'])
			
 
				-                line_index_list.append(line['index'])
			
 
				-            median_value = statistics.median(line_index_list)
			
 
				-            block['index'] = median_value
			
 
				-
			
 
				-        # 删除图表body block中的虚拟line信息, 并用real_lines信息回填
			
 
				-        if block['type'] in [BlockType.ImageBody, BlockType.TableBody]:
			
 
				-            block['virtual_lines'] = copy.deepcopy(block['lines'])
			
 
				-            block['lines'] = copy.deepcopy(block['real_lines'])
			
 
				-            del block['real_lines']
			
 
				+                line['index'] = line_inedx
			
 
				+                line_inedx += 1
			
 
				 
			
 
				     return fix_blocks
			
 
				 
			
@@ -264,6 +298,9 @@ def sort_lines_by_model(fix_blocks, page_w, page_h, line_height):
 
				                 block['lines'].append({'bbox': line, 'spans': []})
			
 
				             page_line_list.extend(lines)
			
 
				 
			
 
				+    if len(page_line_list) > 200:  # layoutreader最高支持512line
			
 
				+        return None
			
 
				+
			
 
				     # 使用layoutreader排序
			
 
				     x_scale = 1000.0 / page_w
			
 
				     y_scale = 1000.0 / page_h
			
--- a/magic_pdf/resources/model_config/model_configs.yaml
+++ b/magic_pdf/resources/model_config/model_configs.yaml
@@ -4,4 +4,5 @@ weights:
 
				   yolo_v8_mfd: MFD/YOLO/yolo_v8_ft.pt
			
 
				   unimernet_small: MFR/unimernet_small
			
 
				   struct_eqtable: TabRec/StructEqTable
			
 
				-  tablemaster: TabRec/TableMaster
			
 
				+  tablemaster: TabRec/TableMaster
			
 
				+  rapid_table: TabRec/RapidTable
			
--- a/magic_pdf/tools/common.py
+++ b/magic_pdf/tools/common.py
@@ -14,6 +14,9 @@ from magic_pdf.pipe.TXTPipe import TXTPipe
 
				 from magic_pdf.pipe.UNIPipe import UNIPipe
			
 
				 from magic_pdf.rw.AbsReaderWriter import AbsReaderWriter
			
 
				 from magic_pdf.rw.DiskReaderWriter import DiskReaderWriter
			
 
				+import fitz
			
 
				+# from io import BytesIO
			
 
				+# from pypdf import PdfReader, PdfWriter
			
 
				 
			
 
				 
			
 
				 def prepare_env(output_dir, pdf_file_name, method):
			
@@ -26,6 +29,42 @@ def prepare_env(output_dir, pdf_file_name, method):
 
				     return local_image_dir, local_md_dir
			
 
				 
			
 
				 
			
 
				+# def convert_pdf_bytes_to_bytes_by_pypdf(pdf_bytes, start_page_id=0, end_page_id=None):
			
 
				+#     # 将字节数据包装在 BytesIO 对象中
			
 
				+#     pdf_file = BytesIO(pdf_bytes)
			
 
				+#     # 读取 PDF 的字节数据
			
 
				+#     reader = PdfReader(pdf_file)
			
 
				+#     # 创建一个新的 PDF 写入器
			
 
				+#     writer = PdfWriter()
			
 
				+#     # 将所有页面添加到新的 PDF 写入器中
			
 
				+#     end_page_id = end_page_id if end_page_id is not None and end_page_id >= 0 else len(reader.pages) - 1
			
 
				+#     if end_page_id > len(reader.pages) - 1:
			
 
				+#         logger.warning("end_page_id is out of range, use pdf_docs length")
			
 
				+#         end_page_id = len(reader.pages) - 1
			
 
				+#     for i, page in enumerate(reader.pages):
			
 
				+#         if start_page_id <= i <= end_page_id:
			
 
				+#             writer.add_page(page)
			
 
				+#     # 创建一个字节缓冲区来存储输出的 PDF 数据
			
 
				+#     output_buffer = BytesIO()
			
 
				+#     # 将 PDF 写入字节缓冲区
			
 
				+#     writer.write(output_buffer)
			
 
				+#     # 获取字节缓冲区的内容
			
 
				+#     converted_pdf_bytes = output_buffer.getvalue()
			
 
				+#     return converted_pdf_bytes
			
 
				+
			
 
				+
			
 
				+def convert_pdf_bytes_to_bytes_by_pymupdf(pdf_bytes, start_page_id=0, end_page_id=None):
			
 
				+    document = fitz.open("pdf", pdf_bytes)
			
 
				+    output_document = fitz.open()
			
 
				+    end_page_id = end_page_id if end_page_id is not None and end_page_id >= 0 else len(document) - 1
			
 
				+    if end_page_id > len(document) - 1:
			
 
				+        logger.warning("end_page_id is out of range, use pdf_docs length")
			
 
				+        end_page_id = len(document) - 1
			
 
				+    output_document.insert_pdf(document, from_page=start_page_id, to_page=end_page_id)
			
 
				+    output_bytes = output_document.tobytes()
			
 
				+    return output_bytes
			
 
				+
			
 
				+
			
 
				 def do_parse(
			
 
				     output_dir,
			
 
				     pdf_file_name,
			
@@ -55,6 +94,8 @@ def do_parse(
 
				         f_draw_model_bbox = True
			
 
				         f_draw_line_sort_bbox = True
			
 
				 
			
 
				+    pdf_bytes = convert_pdf_bytes_to_bytes_by_pymupdf(pdf_bytes, start_page_id, end_page_id)
			
 
				+
			
 
				     orig_model_list = copy.deepcopy(model_list)
			
 
				     local_image_dir, local_md_dir = prepare_env(output_dir, pdf_file_name,
			
 
				                                                 parse_method)
			
@@ -66,15 +107,18 @@ def do_parse(
 
				     if parse_method == 'auto':
			
 
				         jso_useful_key = {'_pdf_type': '', 'model_list': model_list}
			
 
				         pipe = UNIPipe(pdf_bytes, jso_useful_key, image_writer, is_debug=True,
			
 
				-                       start_page_id=start_page_id, end_page_id=end_page_id, lang=lang,
			
 
				+                       # start_page_id=start_page_id, end_page_id=end_page_id,
			
 
				+                       lang=lang,
			
 
				                        layout_model=layout_model, formula_enable=formula_enable, table_enable=table_enable)
			
 
				     elif parse_method == 'txt':
			
 
				         pipe = TXTPipe(pdf_bytes, model_list, image_writer, is_debug=True,
			
 
				-                       start_page_id=start_page_id, end_page_id=end_page_id, lang=lang,
			
 
				+                       # start_page_id=start_page_id, end_page_id=end_page_id,
			
 
				+                       lang=lang,
			
 
				                        layout_model=layout_model, formula_enable=formula_enable, table_enable=table_enable)
			
 
				     elif parse_method == 'ocr':
			
 
				         pipe = OCRPipe(pdf_bytes, model_list, image_writer, is_debug=True,
			
 
				-                       start_page_id=start_page_id, end_page_id=end_page_id, lang=lang,
			
 
				+                       # start_page_id=start_page_id, end_page_id=end_page_id,
			
 
				+                       lang=lang,
			
 
				                        layout_model=layout_model, formula_enable=formula_enable, table_enable=table_enable)
			
 
				     else:
			
 
				         logger.error('unknown parse method')
			
--- a/next_docs/README.md
+++ b/next_docs/README.md
--- a/next_docs/README_zh-CN.md
+++ b/next_docs/README_zh-CN.md
--- a/next_docs/en/_static/image/ReadTheDocs.svg
+++ b/next_docs/en/_static/image/ReadTheDocs.svg
--- a/next_docs/en/additional_notes/changelog.rst
+++ b/next_docs/en/additional_notes/changelog.rst
@@ -1,26 +0,0 @@
 
				-
			
 
				-
			
 
				-Changelog
			
 
				-=========
			
 
				-
			
 
				--  2024/09/27 Version 0.8.1 released, Fixed some bugs, and providing a
			
 
				-   `localized deployment version <projects/web_demo/README.md>`__ of the
			
 
				-   `online
			
 
				-   demo <https://opendatalab.com/OpenSourceTools/Extractor/PDF/>`__ and
			
 
				-   the `front-end interface <projects/web/README.md>`__.
			
 
				--  2024/09/09: Version 0.8.0 released, supporting fast deployment with
			
 
				-   Dockerfile, and launching demos on Huggingface and Modelscope.
			
 
				--  2024/08/30: Version 0.7.1 released, add paddle tablemaster table
			
 
				-   recognition option
			
 
				--  2024/08/09: Version 0.7.0b1 released, simplified installation
			
 
				-   process, added table recognition functionality
			
 
				--  2024/08/01: Version 0.6.2b1 released, optimized dependency conflict
			
 
				-   issues and installation documentation
			
 
				--  2024/07/05: Initial open-source release
			
 
				-
			
 
				-
			
 
				-.. warning::
			
 
				-
			
 
				-   fix ``localized deployment version`` and ``front-end interface``
			
 
				-
			
 
				-
			
--- a/next_docs/en/additional_notes/faq.rst
+++ b/next_docs/en/additional_notes/faq.rst
@@ -74,3 +74,15 @@ CUDA version used by Paddle needs to be upgraded.
 
				    pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
			
 
				 
			
 
				 Reference: https://github.com/opendatalab/MinerU/issues/558
			
 
				+
			
 
				+
			
 
				+7. On some Linux servers, the program immediately reports an error ``Illegal instruction (core dumped)``
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+This might be because the server's CPU does not support the AVX/AVX2
			
 
				+instruction set, or the CPU itself supports it but has been disabled by
			
 
				+the system administrator. You can try contacting the system
			
 
				+administrator to remove the restriction or change to a different server.
			
 
				+
			
 
				+References: https://github.com/opendatalab/MinerU/issues/591 ,
			
 
				+https://github.com/opendatalab/MinerU/issues/736
			
--- a/next_docs/en/additional_notes/known_issues.rst
+++ b/next_docs/en/additional_notes/known_issues.rst
@@ -1,19 +1,20 @@
 
				 Known Issues
			
 
				 ============
			
 
				 
			
 
				--  Reading order is based on the model’s sorting of text distribution in
			
 
				-   space, which may become disordered under extremely complex layouts.
			
 
				+-  Reading order is determined by the model based on the spatial
			
 
				+   distribution of readable content, and may be out of order in some
			
 
				+   areas under extremely complex layouts.
			
 
				 -  Vertical text is not supported.
			
 
				--  Tables of contents and lists are recognized through rules; a few
			
 
				-   uncommon list formats may not be identified.
			
 
				--  Only one level of headings is supported; hierarchical heading levels
			
 
				-   are currently not supported.
			
 
				+-  Tables of contents and lists are recognized through rules, and some
			
 
				+   uncommon list formats may not be recognized.
			
 
				+-  Only one level of headings is supported; hierarchical headings are
			
 
				+   not currently supported.
			
 
				 -  Code blocks are not yet supported in the layout model.
			
 
				--  Comic books, art books, elementary school textbooks, and exercise
			
 
				-   books are not well-parsed yet
			
 
				--  Enabling OCR may produce better results in PDFs with a high density
			
 
				-   of formulas
			
 
				--  If you are processing PDFs with a large number of formulas, it is
			
 
				-   strongly recommended to enable the OCR function. When using PyMuPDF
			
 
				-   to extract text, overlapping text lines can occur, leading to
			
 
				-   inaccurate formula insertion positions.
			
 
				+-  Comic books, art albums, primary school textbooks, and exercises
			
 
				+   cannot be parsed well.
			
 
				+-  Table recognition may result in row/column recognition errors in
			
 
				+   complex tables.
			
 
				+-  OCR recognition may produce inaccurate characters in PDFs of
			
 
				+   lesser-known languages (e.g., diacritical marks in Latin script,
			
 
				+   easily confused characters in Arabic script).
			
 
				+-  Some formulas may not render correctly in Markdown.
			
--- a/next_docs/en/api.rst
+++ b/next_docs/en/api.rst
@@ -7,4 +7,3 @@
 
				    api/read_api
			
 
				    api/schemas
			
 
				    api/io
			
 
				-   api/classes
			
--- a/next_docs/en/api/classes.rst
+++ b/next_docs/en/api/classes.rst
@@ -1,14 +0,0 @@
 
				-Class Hierarchy
			
 
				-===============
			
 
				-
			
 
				-.. inheritance-diagram:: magic_pdf.data.io.base magic_pdf.data.io.http magic_pdf.data.io.s3
			
 
				-   :parts: 2
			
 
				-
			
 
				-
			
 
				-.. inheritance-diagram:: magic_pdf.data.dataset
			
 
				-   :parts: 2
			
 
				-
			
 
				-
			
 
				-.. inheritance-diagram:: magic_pdf.data.data_reader_writer.base magic_pdf.data.data_reader_writer.filebase magic_pdf.data.data_reader_writer.multi_bucket_s3
			
 
				-   :parts: 2
			
 
				-
			
--- a/next_docs/en/api/utils.rst
+++ b/next_docs/en/api/utils.rst
@@ -1 +0,0 @@
 
				-
			
--- a/next_docs/en/conf.py
+++ b/next_docs/en/conf.py
@@ -95,7 +95,7 @@ language = 'en'
 
				 html_theme = 'sphinx_book_theme'
			
 
				 html_logo = '_static/image/logo.png'
			
 
				 html_theme_options = {
			
 
				-    'path_to_docs': 'docs/en',
			
 
				+    'path_to_docs': 'next_docs/en',
			
 
				     'repository_url': 'https://github.com/opendatalab/MinerU',
			
 
				     'use_repository_button': True,
			
 
				 }
			
--- a/next_docs/en/index.rst
+++ b/next_docs/en/index.rst
@@ -46,20 +46,29 @@ the relevant PDF**.
 
				 Key Features
			
 
				 ------------
			
 
				 
			
 
				--  Removes elements such as headers, footers, footnotes, and page
			
 
				-   numbers while maintaining semantic continuity
			
 
				--  Outputs text in a human-readable order from multi-column documents
			
 
				--  Retains the original structure of the document, including titles,
			
 
				-   paragraphs, and lists
			
 
				--  Extracts images, image captions, tables, and table captions
			
 
				--  Automatically recognizes formulas in the document and converts them
			
 
				-   to LaTeX
			
 
				--  Automatically recognizes tables in the document and converts them to
			
 
				-   LaTeX
			
 
				--  Automatically detects and enables OCR for corrupted PDFs
			
 
				--  Supports both CPU and GPU environments
			
 
				--  Supports Windows, Linux, and Mac platforms
			
 
				-
			
 
				+-  Remove headers, footers, footnotes, page numbers, etc., to ensure
			
 
				+   semantic coherence.
			
 
				+-  Output text in human-readable order, suitable for single-column,
			
 
				+   multi-column, and complex layouts.
			
 
				+-  Preserve the structure of the original document, including headings,
			
 
				+   paragraphs, lists, etc.
			
 
				+-  Extract images, image descriptions, tables, table titles, and
			
 
				+   footnotes.
			
 
				+-  Automatically recognize and convert formulas in the document to LaTeX
			
 
				+   format.
			
 
				+-  Automatically recognize and convert tables in the document to LaTeX
			
 
				+   or HTML format.
			
 
				+-  Automatically detect scanned PDFs and garbled PDFs and enable OCR
			
 
				+   functionality.
			
 
				+-  OCR supports detection and recognition of 84 languages.
			
 
				+-  Supports multiple output formats, such as multimodal and NLP
			
 
				+   Markdown, JSON sorted by reading order, and rich intermediate
			
 
				+   formats.
			
 
				+-  Supports various visualization results, including layout
			
 
				+   visualization and span visualization, for efficient confirmation of
			
 
				+   output quality.
			
 
				+-  Supports both CPU and GPU environments.
			
 
				+-  Compatible with Windows, Linux, and Mac platforms.
			
 
				 
			
 
				 User Guide
			
 
				 -------------
			
@@ -91,14 +100,6 @@ Additional Notes
 
				 
			
 
				    additional_notes/known_issues
			
 
				    additional_notes/faq
			
 
				-   additional_notes/changelog
			
 
				    additional_notes/glossary
			
 
				 
			
 
				 
			
 
				-Projects 
			
 
				----------
			
 
				-.. toctree::
			
 
				-   :maxdepth: 1
			
 
				-   :caption: Projects
			
 
				-
			
 
				-   projects
			
--- a/next_docs/en/projects.rst
+++ b/next_docs/en/projects.rst
@@ -1,13 +0,0 @@
 
				-
			
 
				-
			
 
				-
			
 
				-llama_index_rag 
			
 
				-===============
			
 
				-
			
 
				-
			
 
				-gradio_app
			
 
				-============
			
 
				-
			
 
				-
			
 
				-other projects
			
 
				-===============
			
--- a/next_docs/en/user_guide/data/data_reader_writer.rst
+++ b/next_docs/en/user_guide/data/data_reader_writer.rst
@@ -87,6 +87,8 @@ Read Examples
 
				 
			
 
				 .. code:: python
			
 
				 
			
 
				+    from magic_pdf.data.data_reader_writer import *
			
 
				+
			
 
				     # file based related 
			
 
				     file_based_reader1 = FileBasedDataReader('')
			
 
				 
			
@@ -142,6 +144,8 @@ Write Examples
 
				 
			
 
				 .. code:: python
			
 
				 
			
 
				+    from magic_pdf.data.data_reader_writer import *
			
 
				+
			
 
				     # file based related 
			
 
				     file_based_writer1 = FileBasedDataWriter('')
			
 
				 
			
@@ -201,4 +205,4 @@ Write Examples
 
				     s3_writer1.write('s3://test_bucket/efg', '123'.encode())
			
 
				 
			
 
				 
			
 
				-Check :doc:`../../api/classes` for more intuitions or check :doc:`../../api/data_reader_writer` for more details
			
 
				+Check :doc:`../../api/data_reader_writer` for more details
			
--- a/next_docs/en/user_guide/data/dataset.rst
+++ b/next_docs/en/user_guide/data/dataset.rst
@@ -36,5 +36,5 @@ Extract chars via third-party library, currently we use ``pymupdf``.
 
				 
			
 
				 
			
 
				 
			
 
				-Check :doc:`../../api/classes` for more intuitions or check :doc:`../../api/dataset` for more details
			
 
				+Check :doc:`../../api/dataset` for more details
			
 
				 
			
--- a/next_docs/en/user_guide/data/io.rst
+++ b/next_docs/en/user_guide/data/io.rst
@@ -21,5 +21,5 @@ if MinerU have not provide the suitable classes. It is easy to implement new cla
 
				         def write(self, path: str, data: bytes) -> None:
			
 
				             pass
			
 
				 
			
 
				-Check :doc:`../../api/classes` for more intuitions or check :doc:`../../api/io` for more details
			
 
				+Check :doc:`../../api/io` for more details
			
 
				 
			
--- a/next_docs/en/user_guide/data/read_api.rst
+++ b/next_docs/en/user_guide/data/read_api.rst
@@ -18,6 +18,8 @@ Read the contet from jsonl which may located on local machine or remote s3. if y
 
				 
			
 
				 .. code:: python
			
 
				 
			
 
				+    from magic_pdf.data.io.read_api import *
			
 
				+
			
 
				     # read jsonl from local machine 
			
 
				     datasets = read_jsonl("tt.jsonl", None)
			
 
				 
			
@@ -33,6 +35,8 @@ Read pdf from path or directory.
 
				 
			
 
				 .. code:: python
			
 
				 
			
 
				+    from magic_pdf.data.io.read_api import *
			
 
				+
			
 
				     # read pdf path
			
 
				     datasets = read_local_pdfs("tt.pdf")
			
 
				 
			
@@ -47,10 +51,11 @@ Read images from path or directory
 
				 
			
 
				 .. code:: python 
			
 
				 
			
 
				+    from magic_pdf.data.io.read_api import *
			
 
				+
			
 
				     # read from image path 
			
 
				     datasets = read_local_images("tt.png")
			
 
				 
			
 
				-
			
 
				     # read files from directory that endswith suffix in suffixes array 
			
 
				     datasets = read_local_images("images/", suffixes=["png", "jpg"])
			
 
				 
			
--- a/next_docs/en/user_guide/install/boost_with_cuda.rst
+++ b/next_docs/en/user_guide/install/boost_with_cuda.rst
@@ -9,16 +9,18 @@ appropriate guide based on your system:
 
				 
			
 
				 -  :ref:`ubuntu_22_04_lts_section`
			
 
				 -  :ref:`windows_10_or_11_section`
			
 
				+-  Quick Deployment with Docker
			
 
				 
			
 
				--  Quick Deployment with Docker > Docker requires a GPU with at least
			
 
				-   16GB of VRAM, and all acceleration features are enabled by default.
			
 
				+.. admonition:: Important
			
 
				+   :class: tip
			
 
				 
			
 
				-.. note:: 
			
 
				+   Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
			
 
				 
			
 
				-   Before running this Docker, you can use the following command to
			
 
				-   check if your device supports CUDA acceleration on Docker. 
			
 
				+   Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker. 
			
 
				 
			
 
				-   bash  docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
			
 
				+   .. code-block:: bash
			
 
				+
			
 
				+      bash  docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
			
 
				 
			
 
				 .. code:: sh
			
 
				 
			
@@ -42,8 +44,9 @@ Ubuntu 22.04 LTS
 
				 If you see information similar to the following, it means that the
			
 
				 NVIDIA drivers are already installed, and you can skip Step 2.
			
 
				 
			
 
				-Notice:``CUDA Version`` should be >= 12.1, If the displayed version
			
 
				-number is less than 12.1, please upgrade the driver.
			
 
				+.. note::
			
 
				+
			
 
				+   ``CUDA Version`` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
			
 
				 
			
 
				 .. code:: text
			
 
				 
			
@@ -105,8 +108,10 @@ Specify Python version 3.10.
 
				 
			
 
				    pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
			
 
				 
			
 
				-❗ After installation, make sure to check the version of ``magic-pdf``
			
 
				-using the following command:
			
 
				+.. admonition:: Important
			
 
				+    :class: tip
			
 
				+
			
 
				+    ❗ After installation, make sure to check the version of ``magic-pdf`` using the following command:
			
 
				 
			
 
				 .. code:: sh
			
 
				 
			
@@ -127,7 +132,10 @@ the script will automatically generate a ``magic-pdf.json`` file in the
 
				 user directory and configure the default model path. You can find the
			
 
				 ``magic-pdf.json`` file in your user directory.
			
 
				 
			
 
				-   The user directory for Linux is “/home/username”.
			
 
				+.. admonition:: TIP
			
 
				+    :class: tip
			
 
				+
			
 
				+    The user directory for Linux is “/home/username”.
			
 
				 
			
 
				 8. First Run
			
 
				 ~~~~~~~~~~~~
			
@@ -137,7 +145,7 @@ Download a sample file from the repository and test it.
 
				 .. code:: sh
			
 
				 
			
 
				    wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf
			
 
				-   magic-pdf -p small_ocr.pdf
			
 
				+   magic-pdf -p small_ocr.pdf -o ./output
			
 
				 
			
 
				 9. Test CUDA Acceleration
			
 
				 ~~~~~~~~~~~~~~~~~~~~~~~~~
			
@@ -145,10 +153,6 @@ Download a sample file from the repository and test it.
 
				 If your graphics card has at least **8GB** of VRAM, follow these steps
			
 
				 to test CUDA acceleration:
			
 
				 
			
 
				-   ❗ Due to the extremely limited nature of 8GB VRAM for running this
			
 
				-   application, you need to close all other programs using VRAM to
			
 
				-   ensure that 8GB of VRAM is available when running this application.
			
 
				-
			
 
				 1. Modify the value of ``"device-mode"`` in the ``magic-pdf.json``
			
 
				    configuration file located in your home directory.
			
 
				 
			
@@ -162,7 +166,7 @@ to test CUDA acceleration:
 
				 
			
 
				    .. code:: sh
			
 
				 
			
 
				-      magic-pdf -p small_ocr.pdf
			
 
				+      magic-pdf -p small_ocr.pdf -o ./output
			
 
				 
			
 
				 10. Enable CUDA Acceleration for OCR
			
 
				 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
@@ -178,7 +182,9 @@ to test CUDA acceleration:
 
				 
			
 
				    .. code:: sh
			
 
				 
			
 
				-      magic-pdf -p small_ocr.pdf
			
 
				+      magic-pdf -p small_ocr.pdf -o ./output
			
 
				+
			
 
				+
			
 
				 
			
 
				 .. _windows_10_or_11_section:
			
 
				 
			
@@ -218,16 +224,16 @@ Python version must be 3.10.
 
				 
			
 
				    pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
			
 
				 
			
 
				-..
			
 
				+.. admonition:: Important
			
 
				+    :class: tip
			
 
				 
			
 
				-   ❗️After installation, verify the version of ``magic-pdf``:
			
 
				+    ❗️After installation, verify the version of ``magic-pdf``:
			
 
				 
			
 
				-   .. code:: bash
			
 
				+    .. code:: bash
			
 
				 
			
 
				       magic-pdf --version
			
 
				 
			
 
				-   If the version number is less than 0.7.0, please report it in the
			
 
				-   issues section.
			
 
				+    If the version number is less than 0.7.0, please report it in the issues section.
			
 
				 
			
 
				 5. Download Models
			
 
				 ~~~~~~~~~~~~~~~~~~
			
@@ -242,7 +248,10 @@ the script will automatically generate a ``magic-pdf.json`` file in the
 
				 user directory and configure the default model path. You can find the
			
 
				 ``magic-pdf.json`` file in your 【user directory】 .
			
 
				 
			
 
				-   The user directory for Windows is “C:/Users/username”.
			
 
				+.. admonition:: Tip
			
 
				+    :class: tip
			
 
				+
			
 
				+    The user directory for Windows is “C:/Users/username”.
			
 
				 
			
 
				 7. First Run
			
 
				 ~~~~~~~~~~~~
			
@@ -252,7 +261,7 @@ Download a sample file from the repository and test it.
 
				 .. code:: powershell
			
 
				 
			
 
				      wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf -O small_ocr.pdf
			
 
				-     magic-pdf -p small_ocr.pdf
			
 
				+     magic-pdf -p small_ocr.pdf -o ./output
			
 
				 
			
 
				 8. Test CUDA Acceleration
			
 
				 ~~~~~~~~~~~~~~~~~~~~~~~~~
			
@@ -260,27 +269,23 @@ Download a sample file from the repository and test it.
 
				 If your graphics card has at least 8GB of VRAM, follow these steps to
			
 
				 test CUDA-accelerated parsing performance.
			
 
				 
			
 
				-   ❗ Due to the extremely limited nature of 8GB VRAM for running this
			
 
				-   application, you need to close all other programs using VRAM to
			
 
				-   ensure that 8GB of VRAM is available when running this application.
			
 
				-
			
 
				-1. **Overwrite the installation of torch and torchvision** supporting
			
 
				-   CUDA.
			
 
				+1. **Overwrite the installation of torch and torchvision** supporting CUDA.
			
 
				 
			
 
				-   ::
			
 
				+.. code:: sh
			
 
				 
			
 
				-      pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
			
 
				+   pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
			
 
				 
			
 
				-   ..
			
 
				+.. admonition:: Important
			
 
				+    :class: tip
			
 
				 
			
 
				-      ❗️Ensure the following versions are specified in the command:
			
 
				+    ❗️Ensure the following versions are specified in the command:
			
 
				 
			
 
				-      ::
			
 
				+ 
			
 
				+    .. code:: sh
			
 
				 
			
 
				          torch==2.3.1 torchvision==0.18.1
			
 
				 
			
 
				-      These are the highest versions we support. Installing higher
			
 
				-      versions without specifying them will cause the program to fail.
			
 
				+    These are the highest versions we support. Installing higher versions without specifying them will cause the program to fail.
			
 
				 
			
 
				 2. **Modify the value of ``"device-mode"``** in the ``magic-pdf.json``
			
 
				    configuration file located in your user directory.
			
@@ -295,7 +300,7 @@ test CUDA-accelerated parsing performance.
 
				 
			
 
				    ::
			
 
				 
			
 
				-      magic-pdf -p small_ocr.pdf
			
 
				+      magic-pdf -p small_ocr.pdf -o ./output
			
 
				 
			
 
				 9. Enable CUDA Acceleration for OCR
			
 
				 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
@@ -311,5 +316,4 @@ test CUDA-accelerated parsing performance.
 
				 
			
 
				    ::
			
 
				 
			
 
				-      magic-pdf -p small_ocr.pdf
			
 
				-
			
 
				+      magic-pdf -p small_ocr.pdf -o ./output
			
--- a/next_docs/en/user_guide/install/install.rst
+++ b/next_docs/en/user_guide/install/install.rst
@@ -1,87 +1,90 @@
 
				 
			
 
				 Install 
			
 
				 ===============================================================
			
 
				-If you encounter any installation issues, please first consult the FAQ.
			
 
				-If the parsing results are not as expected, refer to the Known Issues.
			
 
				-There are three different ways to experience MinerU
			
 
				-
			
 
				-Pre-installation Notice—Hardware and Software Environment Support
			
 
				-------------------------------------------------------------------
			
 
				-
			
 
				-To ensure the stability and reliability of the project, we only optimize
			
 
				-and test for specific hardware and software environments during
			
 
				-development. This ensures that users deploying and running the project
			
 
				-on recommended system configurations will get the best performance with
			
 
				-the fewest compatibility issues.
			
 
				-
			
 
				-By focusing resources on the mainline environment, our team can more
			
 
				-efficiently resolve potential bugs and develop new features.
			
 
				-
			
 
				-In non-mainline environments, due to the diversity of hardware and
			
 
				-software configurations, as well as third-party dependency compatibility
			
 
				-issues, we cannot guarantee 100% project availability. Therefore, for
			
 
				-users who wish to use this project in non-recommended environments, we
			
 
				-suggest carefully reading the documentation and FAQ first. Most issues
			
 
				-already have corresponding solutions in the FAQ. We also encourage
			
 
				-community feedback to help us gradually expand support.
			
 
				+If you encounter any installation issues, please first consult the :doc:`../../additional_notes/faq`.
			
 
				+If the parsing results are not as expected, refer to the :doc:`../../additional_notes/known_issues`.
			
 
				+
			
 
				+
			
 
				+.. admonition:: Warning
			
 
				+    :class: tip
			
 
				+
			
 
				+    **Pre-installation Notice—Hardware and Software Environment Support**
			
 
				+
			
 
				+    To ensure the stability and reliability of the project, we only optimize
			
 
				+    and test for specific hardware and software environments during
			
 
				+    development. This ensures that users deploying and running the project
			
 
				+    on recommended system configurations will get the best performance with
			
 
				+    the fewest compatibility issues.
			
 
				+
			
 
				+    By focusing resources on the mainline environment, our team can more
			
 
				+    efficiently resolve potential bugs and develop new features.
			
 
				+
			
 
				+    In non-mainline environments, due to the diversity of hardware and
			
 
				+    software configurations, as well as third-party dependency compatibility
			
 
				+    issues, we cannot guarantee 100% project availability. Therefore, for
			
 
				+    users who wish to use this project in non-recommended environments, we
			
 
				+    suggest carefully reading the documentation and FAQ first. Most issues
			
 
				+    already have corresponding solutions in the FAQ. We also encourage
			
 
				+    community feedback to help us gradually expand support.
			
 
				 
			
 
				 .. raw:: html
			
 
				 
			
 
				-   <style>
			
 
				-      table, th, td {
			
 
				-      border: 1px solid black;
			
 
				-      border-collapse: collapse;
			
 
				-      }
			
 
				-   </style>
			
 
				-   <table>
			
 
				-    <tr>
			
 
				-        <td colspan="3" rowspan="2">Operating System</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td>Ubuntu 22.04 LTS</td>
			
 
				-        <td>Windows 10 / 11</td>
			
 
				-        <td>macOS 11+</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td colspan="3">CPU</td>
			
 
				-        <td>x86_64</td>
			
 
				-        <td>x86_64</td>
			
 
				-        <td>x86_64 / arm64</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td colspan="3">Memory</td>
			
 
				-        <td colspan="3">16GB or more, recommended 32GB+</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td colspan="3">Python Version</td>
			
 
				-        <td colspan="3">3.10</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td colspan="3">Nvidia Driver Version</td>
			
 
				-        <td>latest (Proprietary Driver)</td>
			
 
				-        <td>latest</td>
			
 
				-        <td>None</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td colspan="3">CUDA Environment</td>
			
 
				-        <td>Automatic installation [12.1 (pytorch) + 11.8 (paddle)]</td>
			
 
				-        <td>11.8 (manual installation) + cuDNN v8.7.0 (manual installation)</td>
			
 
				-        <td>None</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td rowspan="2">GPU Hardware Support List</td>
			
 
				-        <td colspan="2">Minimum Requirement 8G+ VRAM</td>
			
 
				-        <td colspan="2">3060ti/3070/3080/3080ti/4060/4070/4070ti<br>
			
 
				-        8G VRAM enables layout, formula recognition acceleration and OCR acceleration</td>
			
 
				-        <td rowspan="2">None</td>
			
 
				-    </tr>
			
 
				-    <tr>
			
 
				-        <td colspan="2">Recommended Configuration 16G+ VRAM</td>
			
 
				-        <td colspan="2">3090/3090ti/4070ti super/4080/4090<br>
			
 
				-        16G VRAM or more can enable layout, formula recognition, OCR acceleration and table recognition acceleration simultaneously
			
 
				-        </td>
			
 
				-    </tr>
			
 
				-   </table>
			
 
				+    <style>
			
 
				+        table, th, td {
			
 
				+        border: 1px solid black;
			
 
				+        border-collapse: collapse;
			
 
				+        }
			
 
				+    </style>
			
 
				+    <table>
			
 
				+        <tr>
			
 
				+            <td colspan="3" rowspan="2">Operating System</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td>Ubuntu 22.04 LTS</td>
			
 
				+            <td>Windows 10 / 11</td>
			
 
				+            <td>macOS 11+</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td colspan="3">CPU</td>
			
 
				+            <td>x86_64(unsupported ARM Linux)</td>
			
 
				+            <td>x86_64(unsupported ARM Windows)</td>
			
 
				+            <td>x86_64 / arm64</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td colspan="3">Memory</td>
			
 
				+            <td colspan="3">16GB or more, recommended 32GB+</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td colspan="3">Python Version</td>
			
 
				+            <td colspan="3">3.10(Please make sure to create a Python 3.10 virtual environment using conda)</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td colspan="3">Nvidia Driver Version</td>
			
 
				+            <td>latest (Proprietary Driver)</td>
			
 
				+            <td>latest</td>
			
 
				+            <td>None</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td colspan="3">CUDA Environment</td>
			
 
				+            <td>Automatic installation [12.1 (pytorch) + 11.8 (paddle)]</td>
			
 
				+            <td>11.8 (manual installation) + cuDNN v8.7.0 (manual installation)</td>
			
 
				+            <td>None</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td rowspan="2">GPU Hardware Support List</td>
			
 
				+            <td colspan="2">Minimum Requirement 8G+ VRAM</td>
			
 
				+            <td colspan="2">3060ti/3070/4060<br>
			
 
				+            8G VRAM enables layout, formula recognition acceleration and OCR acceleration</td>
			
 
				+            <td rowspan="2">None</td>
			
 
				+        </tr>
			
 
				+        <tr>
			
 
				+            <td colspan="2">Recommended Configuration 10G+ VRAM</td>
			
 
				+            <td colspan="2">3080/3080ti/3090/3090ti/4070/4070ti/4070tisuper/4080/4090<br>
			
 
				+            10G VRAM or more can enable layout, formula recognition, OCR acceleration and table recognition acceleration simultaneously
			
 
				+            </td>
			
 
				+        </tr>
			
 
				+    </table>
			
 
				+
			
 
				 
			
 
				 
			
 
				 Create an environment
			
--- a/next_docs/en/user_guide/quick_start/command_line.rst
+++ b/next_docs/en/user_guide/quick_start/command_line.rst
@@ -55,5 +55,8 @@ directory. The output file list is as follows:
 
				    ├── some_pdf_spans.pdf                   # smallest granularity bbox position information diagram
			
 
				    └── some_pdf_content_list.json           # Rich text JSON arranged in reading order
			
 
				 
			
 
				-For more information about the output files, please refer to the :doc:`../tutorial/output_file_description`
			
 
				+.. admonition:: Tip
			
 
				+   :class: tip
			
 
				+
			
 
				+   For more information about the output files, please refer to the :doc:`../tutorial/output_file_description`
			
 
				 
			
--- a/next_docs/en/user_guide/quick_start/extract_text.rst
+++ b/next_docs/en/user_guide/quick_start/extract_text.rst
@@ -1,10 +0,0 @@
 
				-
			
 
				-
			
 
				-Extract Content from Pdf
			
 
				-========================
			
 
				-
			
 
				-.. code:: python
			
 
				-
			
 
				-    from magic_pdf.data.read_api import read_local_pdfs
			
 
				-    from magic_pdf.pdf_parse_union_core_v2 import pdf_parse_union
			
 
				-    from magic_pdf.model.doc_analyze_by_custom_model import doc_analyze
			
--- a/next_docs/zh_cn/_static/image/MinerU-logo-hq.png
+++ b/next_docs/zh_cn/_static/image/MinerU-logo-hq.png
--- a/next_docs/zh_cn/_static/image/MinerU-logo.png
+++ b/next_docs/zh_cn/_static/image/MinerU-logo.png
--- a/next_docs/zh_cn/_static/image/ReadTheDocs.svg
+++ b/next_docs/zh_cn/_static/image/ReadTheDocs.svg
--- a/next_docs/zh_cn/_static/image/datalab_logo.png
+++ b/next_docs/zh_cn/_static/image/datalab_logo.png
--- a/next_docs/zh_cn/_static/image/flowchart_en.png
+++ b/next_docs/zh_cn/_static/image/flowchart_en.png
--- a/next_docs/zh_cn/_static/image/flowchart_zh_cn.png
+++ b/next_docs/zh_cn/_static/image/flowchart_zh_cn.png
--- a/next_docs/zh_cn/_static/image/layout_example.png
+++ b/next_docs/zh_cn/_static/image/layout_example.png
--- a/next_docs/zh_cn/_static/image/poly.png
+++ b/next_docs/zh_cn/_static/image/poly.png
--- a/next_docs/zh_cn/_static/image/project_panorama_en.png
+++ b/next_docs/zh_cn/_static/image/project_panorama_en.png
--- a/next_docs/zh_cn/_static/image/project_panorama_zh_cn.png
+++ b/next_docs/zh_cn/_static/image/project_panorama_zh_cn.png
--- a/next_docs/zh_cn/_static/image/spans_example.png
+++ b/next_docs/zh_cn/_static/image/spans_example.png
--- a/next_docs/zh_cn/_static/image/web_demo_1.png
+++ b/next_docs/zh_cn/_static/image/web_demo_1.png
--- a/next_docs/zh_cn/additional_notes/faq.rst
+++ b/next_docs/zh_cn/additional_notes/faq.rst
@@ -0,0 +1,72 @@
 
				+常见问题解答
			
 
				+============
			
 
				+
			
 
				+1.在较新版本的mac上使用命令安装pip install magic-pdf[full] zsh: no matches found: magic-pdf[full]
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+在 macOS 上，默认的 shell 从 Bash 切换到了 Z shell，而 Z shell 对于某些类型的字符串匹配有特殊的处理逻辑，这可能导致no matches found错误。 可以通过在命令行禁用globbing特性，再尝试运行安装命令
			
 
				+
			
 
				+.. code:: bash
			
 
				+
			
 
				+   setopt no_nomatch
			
 
				+   pip install magic-pdf[full]
			
 
				+
			
 
				+2.使用过程中遇到_pickle.UnpicklingError: invalid load key, ‘v’.错误
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+可能是由于模型文件未下载完整导致，可尝试重新下载模型文件后再试。参考：https://github.com/opendatalab/MinerU/issues/143
			
 
				+
			
 
				+3.模型文件应该下载到哪里/models-dir的配置应该怎么填
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+模型文件的路径输入是在”magic-pdf.json”中通过
			
 
				+
			
 
				+.. code:: json
			
 
				+
			
 
				+   {
			
 
				+     "models-dir": "/tmp/models"
			
 
				+   }
			
 
				+
			
 
				+进行配置的。这个路径是绝对路径而不是相对路径，绝对路径的获取可在models目录中通过命令 “pwd” 获取。
			
 
				+参考：https://github.com/opendatalab/MinerU/issues/155#issuecomment-2230216874
			
 
				+
			
 
				+4.在WSL2的Ubuntu22.04中遇到报错\ ``ImportError: libGL.so.1: cannot open shared object file: No such file or directory``
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+WSL2的Ubuntu22.04中缺少\ ``libgl``\ 库，可通过以下命令安装\ ``libgl``\ 库解决：
			
 
				+
			
 
				+.. code:: bash
			
 
				+
			
 
				+   sudo apt-get install libgl1-mesa-glx
			
 
				+
			
 
				+参考：https://github.com/opendatalab/MinerU/issues/388
			
 
				+
			
 
				+5.遇到报错 ``ModuleNotFoundError : Nomodulenamed 'fairscale'``
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+需要卸载该模块并重新安装
			
 
				+
			
 
				+.. code:: bash
			
 
				+
			
 
				+   pip uninstall fairscale
			
 
				+   pip install fairscale
			
 
				+
			
 
				+参考：https://github.com/opendatalab/MinerU/issues/411
			
 
				+
			
 
				+6.在部分较新的设备如H100上，使用CUDA加速OCR时解析出的文字乱码。
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+cuda11对新显卡的兼容性不好，需要升级paddle使用的cuda版本
			
 
				+
			
 
				+.. code:: bash
			
 
				+
			
 
				+   pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
			
 
				+
			
 
				+参考：https://github.com/opendatalab/MinerU/issues/558
			
 
				+
			
 
				+7.在部分Linux服务器上，程序一运行就报错 ``非法指令 (核心已转储)`` 或 ``Illegal instruction (core dumped)``
			
 
				+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+可能是因为服务器CPU不支持AVX/AVX2指令集，或cpu本身支持但被运维禁用了，可以尝试联系运维解除限制或更换服务器。
			
 
				+
			
 
				+参考：https://github.com/opendatalab/MinerU/issues/591 ，https://github.com/opendatalab/MinerU/issues/736
			
--- a/next_docs/zh_cn/additional_notes/glossary.rst
+++ b/next_docs/zh_cn/additional_notes/glossary.rst
@@ -0,0 +1,11 @@
 
				+
			
 
				+
			
 
				+名词解释
			
 
				+===========
			
 
				+
			
 
				+1. jsonl 
			
 
				+    TODO: add description
			
 
				+
			
 
				+2. magic-pdf.json
			
 
				+    TODO: add description
			
 
				+