1
0

8 Commits 1262c510b7 ... 35ee4abec4

Autor SHA1 Nachricht Datum
  zhch158_admin 35ee4abec4 feat(更新OCR文档配置): 修改OCR文档配置,更新输入文件、输出目录和配置文件路径,调整处理页面数量以支持新的文档格式。 vor 1 Monat
  zhch158_admin 5f33763ee3 feat(增强OCR框架): 在TextFiller类中添加新的配置参数以处理OCR框的宽度溢出和邻格重叠,优化跨单元格检测逻辑,提升文本填充的准确性。 vor 1 Monat
  zhch158_admin f32733271c feat(优化水印处理): 在图像预处理流程中添加页级水印去除功能,更新相关处理器以支持跳过水印选项,提升表格方向校正的准确性。 vor 1 Monat
  zhch158_admin 3e4d9ab6f0 feat(新增文档配置): 添加三个新的OCR文档配置文件,分别为陈3_微信图、彭_广东兴宁农村商业银行和钟_广东陆丰农村商业银行,定义了使用的OCR工具及其结果目录。 vor 1 Monat
  zhch158_admin 5263c0e66c fix(更新Python环境名称): 将测试文件中的Python环境名称从`mineru2`更新为`mineru`,确保一致性。 vor 1 Monat
  zhch158_admin 64ad4a204d fix(修复倾斜角度检测): 修改SkewDetector类中的cv2.fitLine返回值处理,确保返回值显式转为标量,并更新异常处理为错误日志记录。 vor 1 Monat
  zhch158_admin fb3ea48bb4 feat(新增银行交易流水场景配置): 添加银行交易流水V4场景配置,整合多种OCR识别功能及布局检测,支持有线和无线表格处理。 vor 1 Monat
  zhch158_admin 6518b09bbd fix(更新环境名称): 将所有配置和文档中的`mineru2`环境名称更新为`mineru`,确保一致性和准确性。 vor 1 Monat
28 geänderte Dateien mit 491 neuen und 121 gelöschten Zeilen
  1. 18 18
      .github/copilot-instructions.md
  2. 1 1
      README.md
  3. 3 3
      docs/mineru/README.md
  4. 1 1
      docs/ocr_tools/universal_doc_parser/llama.cpp配置说明.md
  5. 7 7
      ocr_tools/daemons/README.md
  6. 3 3
      ocr_tools/daemons/glmocr_local_daemon.sh
  7. 3 3
      ocr_tools/daemons/mineru_local_daemon.sh
  8. 1 1
      ocr_tools/daemons/mineru_vllm_daemon.sh
  9. 3 3
      ocr_tools/daemons/paddle_local_daemon.sh
  10. 2 2
      ocr_tools/ocr_batch/README.md
  11. 2 2
      ocr_tools/ocr_batch/batch_process_pdf.py
  12. 3 1
      ocr_tools/ocr_batch/pdf_list.txt
  13. 3 7
      ocr_tools/ocr_batch/pdf_list_local.txt
  14. 22 6
      ocr_tools/ocr_batch/processor_configs.yaml
  15. 211 0
      ocr_tools/universal_doc_parser/config/bank_statement_glm_vl_local.yaml
  16. 4 2
      ocr_tools/universal_doc_parser/core/element_processors.py
  17. 6 1
      ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py
  18. 5 5
      ocr_tools/universal_doc_parser/main_v2.py
  19. 11 1
      ocr_tools/universal_doc_parser/models/adapters/base.py
  20. 36 21
      ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
  21. 9 4
      ocr_tools/universal_doc_parser/models/adapters/wired_table/skew_detection.py
  22. 54 9
      ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py
  23. 1 1
      ocr_tools/universal_doc_parser/tests/test_glmocr_adapter.py
  24. 21 18
      ocr_validator/config/global.yaml
  25. 20 0
      ocr_validator/config/彭_广东兴宁农村商业银行.yaml
  26. 20 0
      ocr_validator/config/钟_广东陆丰农村商业银行.yaml
  27. 20 0
      ocr_validator/config/陈3_微信图.yaml
  28. 1 1
      pyrightconfig.json

+ 18 - 18
.github/copilot-instructions.md

@@ -11,11 +11,11 @@
 
 ## Python 环境要求
 
-**重要:本项目必须在 `mineru2` conda 环境下运行所有代码。**
+**重要:本项目必须在 `mineru` conda 环境下运行所有代码。**
 
 ### 环境配置
-- **Python 解释器**: `/opt/miniconda3/envs/mineru2/bin/python`
-- **Conda 环境**: `mineru2`
+- **Python 解释器**: `/opt/miniconda3/envs/mineru/bin/python`
+- **Conda 环境**: `mineru`
 - **Python 版本**: 3.12+
 - **平台**: macOS (Darwin)
 
@@ -23,17 +23,17 @@
 
 1. **所有 Python 脚本执行前必须激活环境**:
    ```bash
-   conda activate mineru2
+   conda activate mineru
    ```
 
 2. **直接使用完整路径**:
    ```bash
-   /opt/miniconda3/envs/mineru2/bin/python script.py
+   /opt/miniconda3/envs/mineru/bin/python script.py
    ```
 
 3. **使用 run_in_terminal 工具时**,命令格式:
    ```bash
-   conda activate mineru2 && python script.py
+   conda activate mineru && python script.py
    ```
 
 ### 项目模块路径
@@ -48,22 +48,22 @@
 
 #### 运行 Streamlit 应用
 ```bash
-cd ocr_validator && conda activate mineru2 && streamlit run streamlit_ocr_validator.py --server.runOnSave=true
+cd ocr_validator && conda activate mineru && streamlit run streamlit_ocr_validator.py --server.runOnSave=true
 ```
 
 #### 运行 Python 脚本
 ```bash
-conda activate mineru2 && python script.py
+conda activate mineru && python script.py
 ```
 
 #### 安装依赖
 ```bash
-conda activate mineru2 && pip install package-name
+conda activate mineru && pip install package-name
 ```
 
 #### 运行测试
 ```bash
-conda activate mineru2 && pytest tests/
+conda activate mineru && pytest tests/
 ```
 
 ### 禁止的操作
@@ -90,7 +90,7 @@ source venv/bin/activate
 
 ### 依赖包说明
 
-主要依赖(已安装在 mineru2 环境):
+主要依赖(已安装在 mineru 环境):
 - streamlit >= 1.30.0
 - plotly >= 5.18.0
 - pandas >= 2.1.0
@@ -102,29 +102,29 @@ source venv/bin/activate
 ### 文件操作规则
 
 1. 创建新文件时,确保使用项目的模块导入路径
-2. 修改配置文件时,保持与 mineru2 环境的一致性
+2. 修改配置文件时,保持与 mineru 环境的一致性
 3. 添加新的脚本时,在文件头部添加 shebang:
    ```python
-   #!/opt/miniconda3/envs/mineru2/bin/python
+   #!/opt/miniconda3/envs/mineru/bin/python
    ```
 
 ### 调试和测试
 
 执行测试或调试时,始终使用:
 ```bash
-conda activate mineru2 && python -m pytest
-conda activate mineru2 && python -m pdb script.py
+conda activate mineru && python -m pytest
+conda activate mineru && python -m pdb script.py
 ```
 
 ### 环境验证
 
 在执行任何 Python 代码前,验证环境:
 ```bash
-conda activate mineru2
+conda activate mineru
 python -c "import sys; print(sys.executable)"
-# 应输出: /opt/miniconda3/envs/mineru2/bin/python
+# 应输出: /opt/miniconda3/envs/mineru/bin/python
 ```
 
 ---
 
-**记住:任何涉及 Python 代码执行、包安装、测试运行的操作,都必须在 mineru2 环境下进行!**
+**记住:任何涉及 Python 代码执行、包安装、测试运行的操作,都必须在 mineru 环境下进行!**

+ 1 - 1
README.md

@@ -161,7 +161,7 @@ git config --local user.email "zhch158@sina.com"
 项目支持多个 Python 环境,根据使用的工具选择:
 
 - **PaddleX 工具**:需要 `paddle_env` 环境(Python 3.11+)
-- **MinerU 工具**:需要 `mineru2` 环境(Python 3.12+)
+- **MinerU 工具**:需要 `mineru` 环境(Python 3.12+)
 - **DotsOCR 工具**:需要 `py312` 环境(Python 3.12+)
 
 详细环境配置请查看:

+ 3 - 3
docs/mineru/README.md

@@ -15,8 +15,8 @@ git config --local user.email "zhch158@sina.com"
 ### 1.2 Python 环境安装
 ```bash
 # 创建 conda 环境
-conda create -n mineru2 python=3.12
-conda activate mineru2
+conda create -n mineru python=3.12
+conda activate mineru
 
 # 安装 MinerU 核心
 pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
@@ -37,7 +37,7 @@ python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda
 python -m mineru.cli.models_download
 
 # 模型会保存到 $MODELSCOPE_CACHE 目录
-# 默认路径: /home/ubuntu/models/modelscope_cache/models/OpenDataLab/MinerU2___5-2509-1___2B
+# 默认路径: /home/ubuntu/models/modelscope_cache/models/OpenDataLab/mineru___5-2509-1___2B
 ```
 
 ### 1.4 环境变量配置

+ 1 - 1
docs/ocr_tools/universal_doc_parser/llama.cpp配置说明.md

@@ -182,7 +182,7 @@ max_tokens <= CONTEXT_SIZE
 
 2. **确认 conda 环境:**
    ```bash
-   conda activate mineru2
+   conda activate mineru
    ```
 
 3. **验证模型文件:**

+ 7 - 7
ocr_tools/daemons/README.md

@@ -83,7 +83,7 @@ cd ocr_tools/daemons
 **服务类型**:MinerU vLLM 服务
 
 **配置参数**:
-- `CONDA_ENV`: conda 环境名称(默认: `mineru2`)
+- `CONDA_ENV`: conda 环境名称(默认: `mineru`)
 - `PORT`: 服务端口(默认: `8121`)
 - `HOST`: 服务主机(默认: `0.0.0.0`)
 - `MODEL_PATH`: 模型路径
@@ -101,7 +101,7 @@ cd ocr_tools/daemons
 - API 文档: `http://localhost:8121/docs`
 
 **依赖环境**:
-- conda 环境: `mineru2`
+- conda 环境: `mineru`
 - 需要安装: `mineru-vllm-server`
 
 **客户端使用**:
@@ -224,7 +224,7 @@ python main.py --input document.pdf --output_dir ./output --ip localhost --port
 **服务类型**:GLM-OCR 本地 GGUF 模型服务(macOS/Metal)
 
 **配置参数**:
-- `CONDA_ENV`: conda 环境名称(默认: `mineru2`)
+- `CONDA_ENV`: conda 环境名称(默认: `mineru`)
 - `PORT`: 服务端口(默认: `8080`)
 - `HOST`: 服务主机(默认: `0.0.0.0`)
 - `MODEL_PATH`: GGUF 模型路径(默认: `~/Library/Caches/llama.cpp/ggml-org_GLM-OCR-GGUF_GLM-OCR-Q8_0.gguf`)
@@ -246,7 +246,7 @@ python main.py --input document.pdf --output_dir ./output --ip localhost --port
 **依赖环境**:
 - macOS (M4 Pro 推荐)
 - Homebrew 安装 llama.cpp: `brew install llama.cpp`
-- conda 环境: `mineru2`
+- conda 环境: `mineru`
 - 模型文件位于: `~/Library/Caches/llama.cpp/`
 
 **模型大小**:
@@ -288,7 +288,7 @@ curl -X POST http://localhost:8080/v1/chat/completions \
 **服务类型**:PaddleOCR-VL-1.5 本地 GGUF 模型服务(macOS/Metal)
 
 **配置参数**:
-- `CONDA_ENV`: conda 环境名称(默认: `mineru2`)
+- `CONDA_ENV`: conda 环境名称(默认: `mineru`)
 - `PORT`: 服务端口(默认: `8081`)
 - `HOST`: 服务主机(默认: `0.0.0.0`)
 - `MODEL_PATH`: GGUF 模型路径(默认: `~/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5.gguf`)
@@ -310,7 +310,7 @@ curl -X POST http://localhost:8080/v1/chat/completions \
 **依赖环境**:
 - macOS (M4 Pro 推荐)
 - Homebrew 安装 llama.cpp: `brew install llama.cpp`
-- conda 环境: `mineru2`
+- conda 环境: `mineru`
 - 模型文件位于: `~/Library/Caches/llama.cpp/`
 
 **模型大小**:
@@ -369,7 +369,7 @@ curl -X POST http://localhost:8081/v1/chat/completions \
 
 #### macOS/Metal 环境(本地 GGUF 服务)
 - 安装 llama.cpp: `brew install llama.cpp`
-- 确保 conda 环境 `mineru2` 已创建
+- 确保 conda 环境 `mineru` 已创建
 - 模型文件自动下载到 `~/Library/Caches/llama.cpp/` 或手动下载:
   - GLM-OCR: https://huggingface.co/ggml-org/GLM-OCR-GGUF
   - PaddleOCR-VL: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF

+ 3 - 3
ocr_tools/daemons/glmocr_local_daemon.sh

@@ -18,7 +18,7 @@ PIDFILE="$LOGDIR/glmocr_llamaserver.pid"
 LOGFILE="$LOGDIR/glmocr_llamaserver.log"
 
 # 配置参数
-CONDA_ENV="mineru2"
+CONDA_ENV="mineru"
 PORT="8101"
 HOST="0.0.0.0"
 
@@ -297,7 +297,7 @@ test_client() {
     echo ""
     echo "测试命令示例:"
     echo "  cd /Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser"
-    echo "  conda activate mineru2"
+    echo "  conda activate mineru"
     echo "  python parse.py --input /path/to/test/image.png --config $CONFIG_FILE --debug"
     echo ""
     echo "或者使用 curl 直接测试 API:"
@@ -350,7 +350,7 @@ usage() {
     echo "前置要求:"
     echo "  1. 安装 llama.cpp: brew install llama.cpp"
     echo "  2. 模型文件位于: ~/Library/Caches/llama.cpp/"
-    echo "  3. conda 环境 mineru2 已配置"
+    echo "  3. conda 环境 mineru 已配置"
 }
 
 case "$1" in

+ 3 - 3
ocr_tools/daemons/mineru_local_daemon.sh

@@ -18,7 +18,7 @@ PIDFILE="$LOGDIR/mineru_llamaserver.pid"
 LOGFILE="$LOGDIR/mineru_llamaserver.log"
 
 # 配置参数
-CONDA_ENV="mineru2"
+CONDA_ENV="mineru"
 PORT="8103"
 HOST="0.0.0.0"
 
@@ -306,7 +306,7 @@ test_client() {
     echo ""
     echo "测试命令示例:"
     echo "  cd /Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser"
-    echo "  conda activate mineru2"
+    echo "  conda activate mineru"
     echo "  python main_v2.py -i /path/to/test.pdf -c $CONFIG_FILE -o /tmp/test_output -s bank_statement --pages 1 --streaming"
     echo ""
     echo "或者使用 curl 直接测试 API:"
@@ -359,7 +359,7 @@ usage() {
     echo "前置要求:"
     echo "  1. 安装 llama.cpp: brew install llama.cpp"
     echo "  2. 首次下载模型: llama-server -hf mradermacher/MinerU2.5-Pro-2604-1.2B-GGUF:Q8_0"
-    echo "  3. conda 环境 mineru2 已配置"
+    echo "  3. conda 环境 mineru 已配置"
 }
 
 case "$1" in

+ 1 - 1
ocr_tools/daemons/mineru_vllm_daemon.sh

@@ -10,7 +10,7 @@ PIDFILE="$LOGDIR/mineru_vllm.pid"
 LOGFILE="$LOGDIR/mineru_vllm.log"
 
 # 配置参数
-CONDA_ENV="mineru2"
+CONDA_ENV="mineru"
 PORT="8121"
 HOST="0.0.0.0"
 MODEL_PATH="/home/ubuntu/models/modelscope_cache/models/OpenDataLab/MinerU2___5-2509-1___2B"

+ 3 - 3
ocr_tools/daemons/paddle_local_daemon.sh

@@ -17,7 +17,7 @@ PIDFILE="$LOGDIR/paddleocr_llamaserver.pid"
 LOGFILE="$LOGDIR/paddleocr_llamaserver.log"
 
 # 配置参数
-CONDA_ENV="mineru2"
+CONDA_ENV="mineru"
 PORT="8102"
 HOST="0.0.0.0"
 
@@ -300,7 +300,7 @@ test_client() {
     echo ""
     echo "测试命令示例:"
     echo "  cd /Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser"
-    echo "  conda activate mineru2"
+    echo "  conda activate mineru"
     echo "  python parse.py --input /path/to/test/image.png --config $CONFIG_FILE --debug"
     echo ""
     echo "或者使用 curl 直接测试 API:"
@@ -353,7 +353,7 @@ usage() {
     echo "前置要求:"
     echo "  1. 安装 llama.cpp: brew install llama.cpp"
     echo "  2. 模型文件位于: ~/Library/Caches/llama.cpp/"
-    echo "  3. conda 环境 mineru2 已配置"
+    echo "  3. conda 环境 mineru 已配置"
 }
 
 case "$1" in

+ 2 - 2
ocr_tools/ocr_batch/README.md

@@ -15,7 +15,7 @@
 # 1. 使用 DotsOCR(自动切换到 py312 环境)
 python batch_process_pdf.py -p dotsocr_vllm -f pdf_list.txt
 
-# 2. 使用 MinerU(自动切换到 mineru2 环境)
+# 2. 使用 MinerU(自动切换到 mineru 环境)
 python batch_process_pdf.py -p mineru_vllm -f pdf_list.txt
 
 # 3. 使用 PaddleOCR(自动切换到 paddle_env)
@@ -53,7 +53,7 @@ conda activate py312 && python /path/to/dotsocr_vllm_multthreads.py \
 
 ### MinerU:
 ```bash
-conda activate mineru2 && python /path/to/mineru2_vllm_multthreads.py \
+conda activate mineru && python /path/to/mineru2_vllm_multthreads.py \
     --input /path/to/file.pdf \
     --output_dir /path/to/output \
     --server_url=http://10.192.72.11:8121

+ 2 - 2
ocr_tools/ocr_batch/batch_process_pdf.py

@@ -113,7 +113,7 @@ class ConfigManager:
                     '--batch_size=1'
                 ],
                 'output_subdir': 'mineru_vllm_results',
-                'venv': 'conda activate mineru2',
+                'venv': 'conda activate mineru',
                 'description': 'MinerU vLLM 处理器',
                 'log_subdir': 'logs/mineru_vllm'  # 🎯 新增
             },
@@ -669,7 +669,7 @@ def create_parser() -> argparse.ArgumentParser:
   2. 使用 DotsOCR 处理器 (自动切换到 py312 环境):
      python batch_process_pdf.py -p dotsocr_vllm -f pdf_list.txt
 
-  3. 使用 MinerU 处理器 (自动切换到 mineru2 环境):
+  3. 使用 MinerU 处理器 (自动切换到 mineru 环境):
      python batch_process_pdf.py -p mineru_vllm -f pdf_list.txt
 
   4. 处理指定目录下所有 PDF:

+ 3 - 1
ocr_tools/ocr_batch/pdf_list.txt

@@ -17,4 +17,6 @@ B用户_扫描流水.pdf,bank_statement
 朱_中信银行图.pdf,bank_statement
 韩_中国银行图.pdf,bank_statemen
 严_农业银行.pdf,bank_statement
-
+陈3_微信图.pdf,bank_statement
+彭_广东兴宁农村商业银行.pdf,bank_statement
+钟_广东陆丰农村商业银行.pdf,bank_statement

+ 3 - 7
ocr_tools/ocr_batch/pdf_list_local.txt

@@ -1,8 +1,4 @@
 # 文件名<TAB>","场景(bank_statement / financial_report)
-对公_招商银行图.pdf,bank_statement
-B用户_扫描流水.pdf,bank_statement
-康强_北京农村商业银行.pdf,bank_statement
-施博深.pdf,bank_statement
-山西云集科技有限公司.pdf,bank_statement
-许_民生银行图.pdf,bank_statement
-严_农业银行.pdf,bank_statement
+陈3_微信图.pdf,bank_statement
+彭_广东兴宁农村商业银行.pdf,bank_statement
+钟_广东陆丰农村商业银行.pdf,bank_statement

+ 22 - 6
ocr_tools/ocr_batch/processor_configs.yaml

@@ -21,7 +21,7 @@ processors:
       - "--log_level=DEBUG"
     output_subdir: "bank_statement_yusys_v4"
     log_subdir: "logs/bank_statement_yusys_v4"
-    venv: "conda activate mineru2"
+    venv: "conda activate mineru"
     description: "YUSYS Wired UNET OCR 框架 GLM-OCR"
 
   yusys_ocr_v3:
@@ -40,7 +40,7 @@ processors:
     log_subdir: "logs/bank_statement_yusys_v3"
     # output_subdir: "bank_statement_yusys_v2"
     # log_subdir: "logs/bank_statement_yusys_v2"
-    venv: "conda activate mineru2"
+    venv: "conda activate mineru"
     description: "YUSYS Wired UNET OCR 框架"
 
   yusys_mineru:
@@ -59,7 +59,7 @@ processors:
     log_subdir: "logs/bank_statement_mineru_vl"
     # output_subdir: "bank_statement_yusys_v2"
     # log_subdir: "logs/bank_statement_yusys_v2"
-    venv: "conda activate mineru2"
+    venv: "conda activate mineru"
     description: "YUSYS MinerU OCR 框架"
 
   yusys_ocr_local:
@@ -75,9 +75,25 @@ processors:
       - "--log_level=DEBUG"
     output_subdir: "bank_statement_yusys_local"
     log_subdir: "logs/bank_statement_yusys_local"
-    venv: "conda activate mineru2"
+    venv: "conda activate mineru"
     description: "YUSYS(local) Wired UNET OCR GLM-OCR"
 
+  yusys_glmocr_local:
+    script: "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/main_v2.py"
+    input_arg: "--input"
+    output_arg: "--output_dir"
+    scene_arg: "--scene"
+    extra_args:
+      - "--config=/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/config/bank_statement_glm_vl_local.yaml"
+      - "--pages=1-35"
+      - "--streaming"
+      - "--debug"
+      - "--log_level=DEBUG"
+    output_subdir: "bank_statement_yusys_glmocr_local"
+    log_subdir: "logs/bank_statement_yusys_glmocr_local"
+    venv: "conda activate mineru"
+    description: "YUSYS(local) OCR GLM-OCR VLM"
+
   yusys_paddleocr_local:
     script: "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/main_v2.py"
     input_arg: "--input"
@@ -91,7 +107,7 @@ processors:
       - "--log_level=DEBUG"
     output_subdir: "bank_statement_yusys_paddleocr_local"
     log_subdir: "logs/bank_statement_yusys_paddleocr_local"
-    venv: "conda activate mineru2"
+    venv: "conda activate mineru"
     description: "YUSYS(local) Wired UNET OCR PaddleOCR-VL"
 
   # -------------------------------------------------------------------------
@@ -182,7 +198,7 @@ processors:
       - "--batch_size=1"
     output_subdir: "mineru_vllm_results"
     log_subdir: "logs/mineru_vllm"
-    venv: "conda activate mineru2"
+    venv: "conda activate mineru"
     description: "MinerU vLLM 处理器 - 支持PDF和图片"
 
   # -------------------------------------------------------------------------

+ 211 - 0
ocr_tools/universal_doc_parser/config/bank_statement_glm_vl_local.yaml

@@ -0,0 +1,211 @@
+# 银行交易流水场景配置 - V4版本
+# Pipeline V3逻辑: 有线表格使用MinerU UNet, 无线表格/seal使用GLM-OCR VLM
+# llama-server -hf ggml-org/GLM-OCR-GGUF
+scene_name: "bank_statement_yusys_local"
+
+description: "银行流水V4: PP-DocLayoutV3 layout + PaddleOCR + MinerU UNet(有线表格)+ GLM-OCR VLM(无线表格/seal)"
+
+input:
+  supported_formats: [".pdf", ".png", ".jpg", ".jpeg", ".bmp", ".tiff"]
+  dpi: 200
+  txt_pdf_watermark_removal:
+    enabled: true   # 文字型PDF渲染前去除水印XObject(保留文字可搜索性)
+    sample_pages: 3  # 扫描前N页快速预检
+
+preprocessor:
+  module: "mineru"
+  orientation_classifier:
+    enabled: true
+    model_name: "paddle_orientation_classification"
+    model_dir: null  # 使用默认路径
+  unwarping:
+    enabled: false
+  # -------------------------------------------------------
+  # 水印去除配置(适用于银行流水浅色斜向文字水印)
+  # -------------------------------------------------------
+  watermark_removal:
+    enabled: true           # 是否启用水印去除
+    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
+                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
+    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+
+# ============================================================
+# Layout 检测配置 - 智能路由器(按场景直接选择模型)
+# ============================================================
+layout_detection:
+  module: "smart_router"
+  strategy: "scene"  # 按场景直接选择模型,不走ocr_eval
+
+  # 场景策略:指定场景直接选用的布局模型
+  scene_strategy:
+    bank_statement:
+      model: "docling"
+    financial_report:
+      model: "paddle_ppdoclayoutv3"
+  default_model: "docling"
+
+  # 配置多个模型
+  models:
+    docling:
+      module: "docling"
+      model_name: "docling-layout-old"
+      model_dir: "ds4sd/docling-layout-old"
+      device: "cpu"
+      conf: 0.3
+      num_threads: 4
+
+    paddle_ppdoclayoutv3:
+      module: "paddle"
+      model_name: "PP-DocLayoutV3"
+      model_dir: "PaddlePaddle/PP-DocLayoutV3_safetensors"
+      device: "cpu"
+      conf: 0.3
+      num_threads: 4
+      batch_size: 1
+  
+  # 后处理配置
+  post_process:
+    # 将大面积文本块转换为表格(后处理)
+    convert_large_text_to_table: true  # 是否启用
+    min_text_area_ratio: 0.25         # 最小面积占比(25%)
+    min_text_width_ratio: 0.4         # 最小宽度占比(40%)
+    min_text_height_ratio: 0.3        # 最小高度占比(30%)
+
+  # Debug 可视化配置
+  debug_options:
+    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
+    output_dir: null             # 调试输出目录;null不输出
+    prefix: ""                  # 保存文件名前缀(如设置为页码)
+
+# ============================================================
+# OCR 识别配置
+# ============================================================
+ocr_recognition:
+  module: "mineru"
+  language: "ch"
+  det_threshold: 0.5
+  unclip_ratio: 1.5
+  enable_merge_det_boxes: false
+  batch_size: 8
+  device: "cpu"
+
+# ============================================================
+# 表格分类配置(自动区分有线/无线表格)
+# ============================================================
+table_classification:
+  enabled: true               # 启用自动表格分类
+  module: "paddle"            # 分类模型:paddle(MinerU PaddleTableClsModel)
+  confidence_threshold: 0.5   # 分类置信度阈值
+  batch_size: 16              # 批处理大小
+
+  # Debug 可视化配置
+  debug_options:
+    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
+    output_dir: null             # 调试输出目录;null不输出
+    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
+    image_format: "png"          # 可视化图片格式:png/jpg
+    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+
+# ============================================================
+# 有线表格识别专用配置(MinerU UNet)
+# ============================================================
+table_recognition_wired:
+  use_wired_unet: false      # 不使用有线表格识别
+  upscale_ratio: 3.333
+  need_ocr: true
+  row_threshold: 10
+  col_threshold: 15
+  ocr_conf_threshold: 0.9       # 单元格 OCR 置信度阈值
+  cell_crop_margin: 2
+  use_custom_postprocess: true  # 是否使用自定义后处理(默认启用)
+
+  # 是否启用倾斜矫正
+  enable_deskew: true
+
+  # 🆕 启用多源单元格融合
+  use_cell_fusion: true
+  
+  # 融合引擎配置
+  cell_fusion:
+    # RT-DETR 模型路径(必需)
+    rtdetr_model_path: "/Users/zhch158/models/pytorch_models/Table/RT-DETR-L_wired_table_cell_det.onnx"
+    
+    # 融合权重
+    unet_weight: 0.6        # UNet 权重(结构性强)
+    rtdetr_weight: 0.4      # RT-DETR 权重(鲁棒性强)
+    
+    # 阈值配置
+    iou_merge_threshold: 0.7    # 高IoU合并阈值(>0.7则加权平均)
+    iou_nms_threshold: 0.5      # NMS去重阈值
+    rtdetr_conf_threshold: 0.5  # RT-DETR置信度阈值
+    
+    # 功能开关
+    enable_ocr_compensation: true      # 启用OCR边缘补偿
+
+  # Debug 可视化配置
+  debug_options:
+    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
+    output_dir: null             # 调试输出目录;null不输出
+    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
+    save_connected_components: true  # 保存连通域提取的单元格图
+    save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
+    save_text_overlay: true      # 保存文本填充覆盖图
+    image_format: "png"          # 可视化图片格式:png/jpg
+    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+
+# ============================================================
+# VL识别配置 - 使用 GLM-OCR(无线表格 + seal识别)
+# ============================================================
+vl_recognition:
+  module: "glmocr"
+  api_url: "http://localhost:8101/v1/chat/completions"
+  api_key: null  # 可选,如需要可填写
+  model: "glm-ocr"
+  max_image_size: 3500  # GLM-OCR 推荐的最大图片尺寸
+  resize_mode: 'max'    # 缩放模式: 'max' 保持宽高比, 'fixed' 固定尺寸
+  verify_ssl: false
+  
+  # Task prompt mapping - 针对不同任务使用不同提示词
+  task_prompt_mapping:
+    text: "Text Recognition:"
+    table: "Table Recognition:"
+    formula: "Formula Recognition:"
+    seal: "Seal Recognition:"  # 印章识别的专用提示词
+  
+  # 模型参数
+  model_params:
+    connection_pool_size: 128  # HTTP 连接池大小(应 >= max_workers)
+    http_timeout: 300          # HTTP 请求超时时间(秒)
+    connect_timeout: 30        # 连接超时时间(秒)
+    retry_max_attempts: 2      # 最大重试次数
+    retry_backoff_base_seconds: 0.5
+    retry_backoff_max_seconds: 8.0
+    retry_jitter_ratio: 0.2
+    retry_status_codes: [429, 500, 502, 503, 504]
+    max_tokens: 16384
+    temperature: 0.1
+    top_p: 0.0001
+    top_k: 1
+    repetition_penalty: 1.1
+  
+  # 场景特定配置
+  table_recognition:
+
+# ============================================================
+# 输出配置
+# ============================================================
+output:
+  create_subdir: false
+  save_pdf_images: true
+  save_json: true
+  save_page_json: true
+  save_markdown: true
+  save_page_markdown: true
+  save_html: true
+  save_layout_image: true
+  save_ocr_image: true
+  draw_type_label: true
+  draw_bbox_number: true
+  save_enhanced_json: true
+  normalize_numbers: true
+  debug_mode: false

+ 4 - 2
ocr_tools/universal_doc_parser/core/element_processors.py

@@ -250,9 +250,11 @@ class ElementProcessors:
         
         table_angle = 0
         
-        # 1. 表格方向检测
+        # 1. 表格方向检测(页级已去水印,此处仅校正表格局部方向)
         try:
-            rotated_table, table_angle = self.preprocessor.process(cropped_table)
+            rotated_table, table_angle = self.preprocessor.process(
+                cropped_table, skip_watermark=True
+            )
             if table_angle != 0:
                 logger.info(f"📐 Table rotated {table_angle}°")
                 cropped_table = rotated_table  # cropped_table 现在是旋转后的图像

+ 6 - 1
ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py

@@ -395,13 +395,18 @@ class EnhancedDocPipeline:
         # 用于检测的图片(可能被旋转)
         detection_image = original_image.copy()
         rotate_angle = 0
+
+        # 0. 页级水印去除(全页一次;表格裁剪等下游仅做方向校正,避免重复去水印)
+        detection_image = self.preprocessor.remove_watermark(detection_image)
         
         # 1. 页面方向识别
         # rotate_angle统一定义:图像需要逆时针旋转的角度(0/90/180/270)来变为正视
         if pdf_type == 'ocr':
             # 扫描件:使用OCR方向识别
             try:
-                detection_image, rotate_angle = self.preprocessor.process(original_image)
+                detection_image, rotate_angle = self.preprocessor.process(
+                    detection_image, skip_watermark=True
+                )
                 page_result['angle'] = rotate_angle
                 
                 if rotate_angle != 0:

+ 5 - 5
ocr_tools/universal_doc_parser/main_v2.py

@@ -644,10 +644,10 @@ if __name__ == "__main__":
             # "config": "./config/bank_statement_paddle_vl_local.yaml",
             # "log_file": "./output/logs/bank_statement_paddle_vl_local/process.log",
 
-            "input": "/Users/zhch158/workspace/data/流水分析/严_农业银行.pdf",
-            "output_dir": "/Users/zhch158/workspace/data/流水分析/严_农业银行/bank_statement_mineru_vl",
-            "config": "./config/bank_statement_mineru_vl_local.yaml",
-            "log_file": "./output/logs/bank_statement_mineru_vl/process.log",
+            "input": "/Users/zhch158/workspace/data/流水分析/陈3_微信图.pdf",
+            "output_dir": "/Users/zhch158/workspace/data/流水分析/陈3_微信图/bank_statement_yusys_local",
+            "config": "./config/bank_statement_yusys_local.yaml",
+            "log_file": "./output/logs/bank_statement_yusys_local/process.log",
 
             # 配置文件
             # "config": "./config/bank_statement_yusys_v4.yaml",
@@ -662,7 +662,7 @@ if __name__ == "__main__":
             # "scene": "financial_report",
             
             # 页面范围(可选)
-            "pages": "1",  # 只处理前1页
+            "pages": "3",  # 只处理前1页
             # "pages": "1-3,5,7-10",  # 处理指定页面
             # "pages": "83-109",  # 处理指定页面
 

+ 11 - 1
ocr_tools/universal_doc_parser/models/adapters/base.py

@@ -26,8 +26,18 @@ class BaseAdapter(ABC):
 class BasePreprocessor(BaseAdapter):
     """预处理器基类"""
     
+    def remove_watermark(self, image: Union[np.ndarray, Image.Image]) -> np.ndarray:
+        """页级水印去除(默认无操作,子类可覆盖)。"""
+        if isinstance(image, Image.Image):
+            return np.array(image)
+        return image
+
     @abstractmethod
-    def process(self, image: Union[np.ndarray, Image.Image]) -> tuple[np.ndarray, int]:
+    def process(
+        self,
+        image: Union[np.ndarray, Image.Image],
+        skip_watermark: bool = False,
+    ) -> tuple[np.ndarray, int]:
         """
         处理图像
         返回处理后的图像和旋转角度

+ 36 - 21
ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py

@@ -58,31 +58,46 @@ class MinerUPreprocessor(BasePreprocessor):
         """清理资源"""
         pass
 
-    def process(self, image: Union[np.ndarray, Image.Image]) -> tuple[np.ndarray, int]:
-        """图像预处理"""
-        # 转换为numpy数组
+    def remove_watermark(self, image: Union[np.ndarray, Image.Image]) -> np.ndarray:
+        """页级水印去除(应在整页图像上调用一次,勿对裁剪块重复调用)。"""
         if isinstance(image, Image.Image):
             image = np.array(image)
 
-        rotate_angle = 0
-        processed_image = image
-
-        # 水印去除(在方向校正之前,避免旋转引入额外噪声)
         watermark_cfg = self.config.get('watermark_removal', {})
-        if watermark_cfg.get('enabled', False):
-            threshold = watermark_cfg.get('threshold', 160)
-            morph_close_kernel = watermark_cfg.get('morph_close_kernel', 0)
-            try:
-                processed_image = remove_watermark_from_image_rgb(
-                    processed_image,
-                    threshold=threshold,
-                    morph_close_kernel=morph_close_kernel,
-                    return_pil=False,
-                )
-                logger.info(f"🧹 Watermark removed (threshold={threshold})")
-            except Exception as e:
-                logger.warning(f"⚠️ Watermark removal failed, using original: {e}")
-                processed_image = image
+        if not watermark_cfg.get('enabled', False):
+            return image
+
+        threshold = watermark_cfg.get('threshold', 160)
+        morph_close_kernel = watermark_cfg.get('morph_close_kernel', 0)
+        try:
+            cleaned = remove_watermark_from_image_rgb(
+                image,
+                threshold=threshold,
+                morph_close_kernel=morph_close_kernel,
+                return_pil=False,
+            )
+            logger.info(f"🧹 Watermark removed (threshold={threshold})")
+            return cleaned
+        except Exception as e:
+            logger.warning(f"⚠️ Watermark removal failed, using original: {e}")
+            return image
+
+    def process(
+        self,
+        image: Union[np.ndarray, Image.Image],
+        skip_watermark: bool = False,
+    ) -> tuple[np.ndarray, int]:
+        """图像预处理:可选水印去除 + 方向校正。
+
+        Args:
+            image: 输入图像
+            skip_watermark: 为 True 时跳过水印(页级已去水印或裁剪块场景)
+        """
+        if isinstance(image, Image.Image):
+            image = np.array(image)
+
+        rotate_angle = 0
+        processed_image = image if skip_watermark else self.remove_watermark(image)
 
         # 方向校正
         if self.orientation_classifier is not None:

+ 9 - 4
ocr_tools/universal_doc_parser/models/adapters/wired_table/skew_detection.py

@@ -243,7 +243,12 @@ class SkewDetector:
                 
             # fitLine 获取方向
             # [vx, vy, x, y] normalized vector (vx,vy) and point (x,y)
-            [vx, vy, x, y] = cv2.fitLine(cnt, cv2.DIST_L2, 0, 0.01, 0.01)
+            # cv2.fitLine 返回形状为 (1,1) 的数组,需显式转为标量
+            fit_result = cv2.fitLine(cnt, cv2.DIST_L2, 0, 0.01, 0.01)
+            vx = float(fit_result[0].item())
+            vy = float(fit_result[1].item())
+            x = float(fit_result[2].item())
+            y = float(fit_result[3].item())
             
             # 找到轮廓在这个方向上的极值点
             # 计算所有点在线上的投影
@@ -347,10 +352,10 @@ class SkewDetector:
             return final_angle
             
         except Exception as e:
-            logger.warning(f"基于Mask的倾斜角度检测失败: {e}")
+            logger.error(f"基于Mask的倾斜角度检测失败: {e}")
             import traceback
-            logger.warning(traceback.format_exc())
-            return 0.0
+            logger.error(traceback.format_exc())
+            raise e
     
     def apply_deskew(
         self,

+ 54 - 9
ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py

@@ -36,6 +36,12 @@ class TextFiller:
         self.min_overlap_area: float = config.get("min_overlap_area", 50.0)
         self.center_cell_ratio: float = config.get("center_cell_ratio", 0.5)
         self.other_cell_max_ratio: float = config.get("other_cell_max_ratio", 0.3)
+        # OCR box 宽度超过中心单元格宽度 * 该比例 → 视为横向跨格误合并
+        self.ocr_bbox_width_overflow_ratio: float = config.get("ocr_bbox_width_overflow_ratio", 1.08)
+        # 相邻列单元格与 OCR box 的重叠比例下限(低于 other_cell_max_ratio,用于捕获 ~20-30% 的邻格重叠)
+        self.horizontal_secondary_overlap_ratio: float = config.get(
+            "horizontal_secondary_overlap_ratio", 0.15
+        )
     
     @staticmethod
     def calculate_dynamic_confidence_threshold(text: str, base_threshold: float = 0.9) -> float:
@@ -266,6 +272,34 @@ class TextFiller:
                 logger.debug(f"检测到 OCR box 跨 {len(overlapping_cells)} 个单元格[{', '.join(map(str, overlapping_cells))}]: {ocr_item['text'][:20]}...")
                 
                 processed_ocr_indices.add(ocr_idx)
+
+        # 已匹配到单元格但 OCR box 宽度明显超出单元格(漏检跨格的补充)
+        # for cell_idx, cell_bbox in enumerate(bboxes):
+        #     if not matched_boxes_list[cell_idx]:
+        #         continue
+        #     cell_w = cell_bbox[2] - cell_bbox[0]
+        #     if cell_w <= 0:
+        #         continue
+        #     for box in matched_boxes_list[cell_idx]:
+        #         ocr_bbox = CoordinateUtils.poly_to_bbox(box.get("bbox", []))
+        #         if not ocr_bbox or len(ocr_bbox) < 4:
+        #             continue
+        #         ocr_w = ocr_bbox[2] - ocr_bbox[0]
+        #         if ocr_w <= cell_w * self.ocr_bbox_width_overflow_ratio:
+        #             continue
+        #         cx = (ocr_bbox[0] + ocr_bbox[2]) / 2
+        #         cy = (ocr_bbox[1] + ocr_bbox[3]) / 2
+        #         spanning = self.detect_ocr_box_spanning_cells(
+        #             ocr_bbox, bboxes, center_point=(cx, cy)
+        #         )
+        #         targets = spanning if len(spanning) >= 2 else [cell_idx]
+        #         for tidx in targets:
+        #             if tidx not in need_reocr_indices:
+        #                 need_reocr_indices.append(tidx)
+        #         logger.debug(
+        #             f"OCR box 宽度({ocr_w:.0f})超出单元格{cell_idx}宽度({cell_w:.0f}),"
+        #             f"标记重识别: {targets}"
+        #         )
         
         return texts, scores, matched_boxes_list, need_reocr_indices
     
@@ -383,19 +417,30 @@ class TextFiller:
                 if is_overlapping:
                     cell_overlaps.append((idx, overlap_ratio))
         
-        # 如果中心点在某个单元格内,且该单元格的重叠比例符合阈值,且没有其他单元格达到次要阈值,则不标记为跨单元格
+        # 中心单元格占主导时可豁免跨格标记,但横向误合并(OCR 框过宽 / 邻格有显著重叠)除外
         if center_cell_idx is not None and cell_overlaps:
-            # 找到中心点所在单元格的重叠比例
-            center_overlap = next((overlap for idx, overlap in cell_overlaps if idx == center_cell_idx), None)
+            center_overlap = next(
+                (overlap for idx, overlap in cell_overlaps if idx == center_cell_idx), None
+            )
             if center_overlap is not None and center_overlap >= self.center_cell_ratio:
-                # 检查是否有其他单元格的重叠比例也超过次要阈值
-                other_high_overlaps = [idx for idx, overlap in cell_overlaps 
-                                      if idx != center_cell_idx and overlap >= self.other_cell_max_ratio]
-                if not other_high_overlaps:
-                    # 中心点所在单元格占主导,不应该标记为跨单元格
+                other_high_overlaps = [
+                    idx for idx, overlap in cell_overlaps
+                    if idx != center_cell_idx and overlap >= self.other_cell_max_ratio
+                ]
+                other_horizontal_overlaps = [
+                    idx for idx, overlap in cell_overlaps
+                    if idx != center_cell_idx
+                    and overlap >= self.horizontal_secondary_overlap_ratio
+                ]
+                center_cell = cell_bboxes[center_cell_idx]
+                center_w = center_cell[2] - center_cell[0]
+                width_overflow = (
+                    center_w > 0
+                    and ocr_width > center_w * self.ocr_bbox_width_overflow_ratio
+                )
+                if not other_high_overlaps and not other_horizontal_overlaps and not width_overflow:
                     return []
         
-        # 返回所有满足阈值的单元格索引
         return [idx for idx, _ in cell_overlaps]
     
     def second_pass_ocr_fill(

+ 1 - 1
ocr_tools/universal_doc_parser/tests/test_glmocr_adapter.py

@@ -1,4 +1,4 @@
-#!/opt/miniconda3/envs/mineru2/bin/python
+#!/opt/miniconda3/envs/mineru/bin/python
 """测试 GLM-OCR 适配器加载
 
 验证:

+ 21 - 18
ocr_validator/config/global.yaml

@@ -145,21 +145,24 @@ pre_validation:
   out_dir: "./output/pre_validation/"
 
 data_sources:
-  - 德_内蒙古银行照.yaml
-  - 对公_招商银行图.yaml
-  - A用户_单元格扫描流水.yaml
-  - B用户_扫描流水.yaml
-  - 康强_北京农村商业银行.yaml
-  - 施博深.yaml
-  - 山西云集科技有限公司.yaml
-  - 至远彩色_2023年报.yaml
-  - 提取自赤峰黄金2023年报.yaml
-  - 乔_建设银行图.yaml
-  - 湛_平安银行图.yaml
-  - 朱_中信银行图.yaml
-  - 张_微信图.yaml
-  - 付_工商银行943825图.yaml
-  - 许_民生银行图.yaml
-  - 韩_中国银行图.yaml
-  - 杨万益_福建农信.yaml
-  - 严_农业银行.yaml
+  # - 德_内蒙古银行照.yaml
+  # - 对公_招商银行图.yaml
+  # - A用户_单元格扫描流水.yaml
+  # - B用户_扫描流水.yaml
+  # - 康强_北京农村商业银行.yaml
+  # - 施博深.yaml
+  # - 山西云集科技有限公司.yaml
+  # - 至远彩色_2023年报.yaml
+  # - 提取自赤峰黄金2023年报.yaml
+  # - 乔_建设银行图.yaml
+  # - 湛_平安银行图.yaml
+  # - 朱_中信银行图.yaml
+  # - 张_微信图.yaml
+  # - 付_工商银行943825图.yaml
+  # - 许_民生银行图.yaml
+  # - 韩_中国银行图.yaml
+  # - 杨万益_福建农信.yaml
+  # - 严_农业银行.yaml
+  - 陈3_微信图.yaml
+  - 彭_广东兴宁农村商业银行.yaml
+  - 钟_广东陆丰农村商业银行.yaml

+ 20 - 0
ocr_validator/config/彭_广东兴宁农村商业银行.yaml

@@ -0,0 +1,20 @@
+# 文档: 彭_广东兴宁农村商业银行
+document:
+  name: "彭_广东兴宁农村商业银行"
+  base_dir: "/Users/zhch158/workspace/data/流水分析/彭_广东兴宁农村商业银行"
+  
+  # 🎯 关键改进:定义该文档使用的 OCR 工具及其结果目录
+  ocr_results:
+    # bank_statement_yusys_local
+    - tool: "mineru"
+      result_dir: "bank_statement_yusys_local"
+      image_dir: "bank_statement_yusys_local/{{name}}"
+      description: "YUSYS-OCR框架(local) Wired UNET OCR GLM-OCR"
+      enabled: true
+
+    # bank_statement_glmocr_local
+    - tool: "mineru"
+      result_dir: "bank_statement_yusys_glmocr_local"
+      image_dir: "bank_statement_yusys_glmocr_local/{{name}}"
+      description: "YUSYS-OCR框架(local) GLM-OCR VLM"
+      enabled: true

+ 20 - 0
ocr_validator/config/钟_广东陆丰农村商业银行.yaml

@@ -0,0 +1,20 @@
+# 文档: 钟_广东陆丰农村商业银行
+document:
+  name: "钟_广东陆丰农村商业银行"
+  base_dir: "/Users/zhch158/workspace/data/流水分析/钟_广东陆丰农村商业银行"
+  
+  # 🎯 关键改进:定义该文档使用的 OCR 工具及其结果目录
+  ocr_results:
+    # bank_statement_yusys_local
+    - tool: "mineru"
+      result_dir: "bank_statement_yusys_local"
+      image_dir: "bank_statement_yusys_local/{{name}}"
+      description: "YUSYS-OCR框架(local) Wired UNET OCR GLM-OCR"
+      enabled: true
+
+    # bank_statement_glmocr_local
+    - tool: "mineru"
+      result_dir: "bank_statement_yusys_glmocr_local"
+      image_dir: "bank_statement_yusys_glmocr_local/{{name}}"
+      description: "YUSYS-OCR框架(local) GLM-OCR VLM"
+      enabled: true

+ 20 - 0
ocr_validator/config/陈3_微信图.yaml

@@ -0,0 +1,20 @@
+# 文档: 陈3_微信图
+document:
+  name: "陈3_微信图"
+  base_dir: "/Users/zhch158/workspace/data/流水分析/陈3_微信图"
+  
+  # 🎯 关键改进:定义该文档使用的 OCR 工具及其结果目录
+  ocr_results:
+    # bank_statement_yusys_local
+    - tool: "mineru"
+      result_dir: "bank_statement_yusys_local"
+      image_dir: "bank_statement_yusys_local/{{name}}"
+      description: "YUSYS-OCR框架(local) Wired UNET OCR GLM-OCR"
+      enabled: true
+
+    # bank_statement_glmocr_local
+    - tool: "mineru"
+      result_dir: "bank_statement_yusys_glmocr_local"
+      image_dir: "bank_statement_yusys_glmocr_local/{{name}}"
+      description: "YUSYS-OCR框架(local) GLM-OCR VLM"
+      enabled: true

+ 1 - 1
pyrightconfig.json

@@ -8,7 +8,7 @@
   "pythonPlatform": "Darwin",
   "typeCheckingMode": "basic",
   "venvPath": "/opt/miniconda3/envs",
-  "venv": "mineru2",
+  "venv": "mineru",
   "reportMissingImports": true,
   "reportMissingTypeStubs": false,
   "useLibraryCodeForTypes": true