22 Commits 35ee4abec4 ... cde2fb8faa

Autore SHA1 Messaggio Data
  zhch158_admin cde2fb8faa fix(更新水印去除调试函数): 修改save_watermark_removal_debug函数以统一调试输出目录结构,更新文档描述和参数,确保与module debug一致,提升调试过程的清晰性和一致性。 1 mese fa
  zhch158_admin 8427034b4c feat(优化调试输出目录): 添加resolve_module_debug_dir函数以统一调试输出目录结构,更新相关函数以支持新的目录路径,提升调试过程的灵活性和可维护性。 1 mese fa
  zhch158_admin d7e042807c feat(优化调试选项合并): 在PaddleTableClassifier类中更新debug_options合并逻辑,添加default_subdir参数以支持默认子目录配置,提升调试输出路径的灵活性和可维护性。 1 mese fa
  zhch158_admin 5a5b23b3a0 feat(优化调试选项合并): 在MinerUWiredTableRecognizer类中更新debug_options合并逻辑,添加default_subdir参数以支持默认子目录配置,提升调试输出路径的灵活性和可维护性。 1 mese fa
  zhch158_admin 07852d2774 fix(调整水印去除阈值): 更新水印去除功能中的阈值设置,从160调整为175,并添加注释说明对比度增强的依赖关系,以提升图像处理的准确性和可维护性。 1 mese fa
  zhch158_admin dec64903d5 feat(增强调试选项): 在BasePreprocessor和BaseLayoutDetector类中添加subdir参数,优化输出目录配置,提升调试过程的灵活性和可维护性。 1 mese fa
  zhch158_admin c06dc3fb13 fix(更新调试输出目录): 修改TextFiller类中的output_dir参数描述,更新单元格OCR调试目录路径,确保调试输出目录的清晰性和一致性。 1 mese fa
  zhch158_admin ada8334231 feat(增强调试工具): 在WiredTableDebugUtils类中添加resolve_debug_output_dir方法,优化调试选项合并逻辑,支持默认子目录配置,提升调试输出路径的灵活性和可维护性。 1 mese fa
  zhch158_admin 20b05456ab feat(增强调试功能): 在EnhancedDocPipeline类中添加_build_table_module_debug_override方法,以支持更灵活的调试选项构建,更新调试输出目录路径,优化表格模块的调试配置,提升调试过程的可定制性和准确性。 1 mese fa
  zhch158_admin bcc8a748b5 feat(增强布局路由器): 在SmartLayoutRouter中优化模型配置合并逻辑,添加debug_options支持,更新调试输出目录路径,提升调试过程的灵活性和准确性。 1 mese fa
  zhch158_admin b2e8f25369 feat(增强调试选项): 在ElementProcessors类中添加debug_options参数,以支持更灵活的调试配置,优化调试选项的构造逻辑,提升调试过程的可定制性和准确性。 1 mese fa
  zhch158_admin adafec6488 fix(调整调试输出目录): 更新remove_watermark.py中的调试输出目录路径,从debug_comparison/watermark_removal/更改为debug/watermark_removal/,以统一目录结构并提升可维护性。 1 mese fa
  zhch158_admin 3867618ad1 feat(更新银行流水配置): 禁用水印去除功能,添加对比度增强和调试选项,优化调试输出目录结构,提升配置的清晰度和可维护性。 1 mese fa
  zhch158_admin 6a6e6ba69b fix(更新配置): 禁用水印去除功能,调整调试输出目录结构以支持模块化调试,提升配置的清晰度和可维护性。 1 mese fa
  zhch158_admin 0ad77c44e3 feat(增强调试功能): 在配置中添加水印去除调试选项,优化输出配置以支持模块调试,更新示例输入和页面范围,提升调试过程的灵活性和准确性。 1 mese fa
  zhch158_admin 52b9065e9b feat(增强水印处理功能): 在watermark_utils.py中添加多个新函数以支持水印掩膜构建、动态阈值去水印、对比度增强和调试图保存,提升水印去除的灵活性和准确性。 1 mese fa
  zhch158_admin 9624e032a1 feat(新增模块级Debug可视化): 添加ocr_utils/module_debug_viz.py模块,提供布局和OCR调试图的绘制与保存功能,支持JSON输出,增强调试过程的可视化和审计能力。 1 mese fa
  zhch158_admin 57178ab8f2 feat(优化水印去除和方向校正): 增强remove_watermark方法以支持可选对比度增强,改进correct_orientation方法以处理PDF旋转和方向分类器,优化process方法以支持裁剪块处理,提升OCR图像预处理的灵活性和准确性。 1 mese fa
  zhch158_admin 1c67a0d785 feat(增强图像预处理): 在BasePreprocessor类中添加水印调试选项和图像处理顺序配置,优化方向校正和水印去除流程,提升OCR处理的灵活性和准确性。 1 mese fa
  zhch158_admin 92b9d902ee feat(增强布局路由器和文档管道): 在SmartLayoutRouter中添加布局调试上下文传播功能,优化模型检测流程;在EnhancedDocPipeline中改进页面预处理,注入水印调试上下文,增强OCR调试选项,提升处理灵活性和准确性。 1 mese fa
  zhch158_admin ad60ed5eca feat(更新银行流水配置): 修改bank_statement_glm_vl_local.yaml以禁用水印去除功能,更新bank_statement_yusys_local.yaml以增强水印处理配置,添加新参数和调试选项,提升处理灵活性和准确性。 1 mese fa
  zhch158_admin 5bbe299ec9 feat(优化水印去除工具): 更新remove_watermark.py,增强命令行参数支持,添加调试图保存功能,整合水印处理配置,提升处理灵活性和准确性。 1 mese fa
21 ha cambiato i file con 2933 aggiunte e 541 eliminazioni
  1. 534 70
      ocr_tools/remove_watermark_tool/remove_watermark.py
  2. 45 8
      ocr_tools/universal_doc_parser/config/bank_statement_glm_vl.yaml
  3. 61 21
      ocr_tools/universal_doc_parser/config/bank_statement_glm_vl_local.yaml
  4. 46 10
      ocr_tools/universal_doc_parser/config/bank_statement_mineru_vl.yaml
  5. 45 9
      ocr_tools/universal_doc_parser/config/bank_statement_paddle_vl.yaml
  6. 61 21
      ocr_tools/universal_doc_parser/config/bank_statement_paddle_vl_local.yaml
  7. 59 25
      ocr_tools/universal_doc_parser/config/bank_statement_smart_router.yaml
  8. 105 21
      ocr_tools/universal_doc_parser/config/bank_statement_yusys_local.yaml
  9. 61 21
      ocr_tools/universal_doc_parser/config/bank_statement_yusys_v4.yaml
  10. 10 8
      ocr_tools/universal_doc_parser/core/element_processors.py
  11. 44 30
      ocr_tools/universal_doc_parser/core/layout_model_router.py
  12. 131 51
      ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py
  13. 18 8
      ocr_tools/universal_doc_parser/main_v2.py
  14. 200 158
      ocr_tools/universal_doc_parser/models/adapters/base.py
  15. 62 22
      ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
  16. 20 11
      ocr_tools/universal_doc_parser/models/adapters/mineru_wired_table.py
  17. 6 3
      ocr_tools/universal_doc_parser/models/adapters/paddle_table_classifier.py
  18. 38 39
      ocr_tools/universal_doc_parser/models/adapters/wired_table/debug_utils.py
  19. 2 3
      ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py
  20. 231 0
      ocr_utils/module_debug_viz.py
  21. 1154 2
      ocr_utils/watermark_utils.py

+ 534 - 70
ocr_tools/remove_watermark_tool/remove_watermark.py

@@ -2,35 +2,42 @@
 银行流水水印去除工具
 
 支持 PDF 和常见图片格式(jpg/png/tif/bmp/webp)。
-- 输入 PDF → 输出去水印 PDF(扫描件)或直接复制(文字型)
-- 输入图片 → 输出去水印图片(保持原格式)
-适用于福建农信、邮储银行等带有半透明文字水印的银行流水单。
+参数默认从与 main_v2 相同的场景 YAML 读取(preprocessor.watermark_removal),
+命令行仅用于输入/输出、批量、预览及少量覆盖项。
 
 用法:
-    # 处理单个 PDF 或图片
+    # 使用默认场景配置(bank_statement_yusys_local.yaml)
     python remove_watermark.py input.pdf
-    python remove_watermark.py input.jpg
 
-    # 指定输出路径
-    python remove_watermark.py input.pdf -o output.pdf
+    # 指定场景配置(与 Pipeline 一致)
+    python remove_watermark.py input.png -c ../universal_doc_parser/config/bank_statement_yusys_local.yaml
 
-    # 指定页面范围(支持 "1-5,7,9-12" 格式
-    python remove_watermark.py input.pdf --page-range 1-3
+    # 保存调试图(before/after/compare/meta
+    python remove_watermark.py input.png -o ./out --debug
 
-    # 调整去除阈值(默认 160,范围建议 140-180
+    # 临时覆盖阈值(其余仍来自配置文件
     python remove_watermark.py input.pdf --threshold 170
 
-    # 批量处理目录下所有 PDF 和图片
-    python remove_watermark.py /path/to/dir/ --batch
-
-    # 预览单页/图片效果(不保存,直接展示对比图)
+    # 预览
     python remove_watermark.py input.pdf --preview --page 0
-    python remove_watermark.py input.jpg --preview
+
+    # 批量
+    python remove_watermark.py /path/to/dir/ --batch -o ./cleaned
+
+    # 对比 threshold vs masked_adaptive(输出三联图)
+    python remove_watermark.py page_002.png --compare-methods -o ./method_compare
 """
 import argparse
+import copy
+import json
 import sys
+from dataclasses import dataclass
 from pathlib import Path
-from typing import Optional
+from typing import Any, Dict, Optional
+
+import cv2
+import numpy as np
+import yaml
 
 # 将 ocr_platform 根目录加入 sys.path,以便导入 ocr_utils
 _repo_root = Path(__file__).parents[2]
@@ -40,7 +47,10 @@ if str(_repo_root) not in sys.path:
 from loguru import logger
 from ocr_utils.watermark_utils import (
     detect_watermark,
-    remove_watermark_from_image,
+    remove_watermark_from_image_rgb,
+    render_watermark_mask_overlay,
+    save_watermark_removal_debug,
+    save_watermark_mask_debug_layers,
     scan_pdf_watermark_xobjs,
     remove_txt_pdf_watermark,
 )
@@ -48,6 +58,190 @@ from ocr_utils.watermark_utils import (
 # 支持的图片后缀(小写)
 IMAGE_SUFFIXES = {".jpg", ".jpeg", ".png", ".tif", ".tiff", ".bmp", ".webp"}
 
+_DEFAULT_CONFIG_PATH = (
+    _repo_root
+    / "ocr_tools/universal_doc_parser/config/bank_statement_yusys_local.yaml"
+)
+
+
+@dataclass
+class WatermarkToolSettings:
+    """从场景 YAML 解析的水印处理参数(与 Pipeline preprocessor 对齐)。"""
+
+    threshold: int = 160
+    morph_close_kernel: int = 0
+    dpi: int = 200
+    method: str = "threshold"
+    contrast_enhancement: Optional[Dict[str, Any]] = None
+    debug_options: Optional[Dict[str, Any]] = None
+    watermark_enabled: bool = True
+    watermark_config: Optional[Dict[str, Any]] = None
+
+    @property
+    def debug_image_format(self) -> str:
+        opts = self.debug_options or {}
+        return str(opts.get("image_format") or "png").lstrip(".")
+
+
+def load_watermark_settings(config_path: Path) -> WatermarkToolSettings:
+    """
+    从 universal_doc_parser 场景配置读取 preprocessor.watermark_removal 与 input.dpi。
+
+    不依赖完整 ConfigManager,避免仅调试水印时强依赖 layout/ocr 等段。
+    """
+    config_path = Path(config_path)
+    if not config_path.is_file():
+        raise FileNotFoundError(f"配置文件不存在: {config_path}")
+
+    with open(config_path, encoding="utf-8") as f:
+        raw = yaml.safe_load(f) or {}
+
+    preprocessor = raw.get("preprocessor") or {}
+    wm = preprocessor.get("watermark_removal") or {}
+    input_cfg = raw.get("input") or {}
+
+    contrast = wm.get("contrast_enhancement")
+    if contrast is not None and not isinstance(contrast, dict):
+        contrast = None
+
+    wm_full = copy.deepcopy(wm)
+    return WatermarkToolSettings(
+        threshold=int(wm.get("threshold", 160)),
+        morph_close_kernel=int(wm.get("morph_close_kernel", 0)),
+        dpi=int(input_cfg.get("dpi", 200)),
+        method=str(wm.get("method") or "threshold"),
+        contrast_enhancement=copy.deepcopy(contrast) if contrast else None,
+        debug_options=copy.deepcopy(wm.get("debug_options"))
+        if wm.get("debug_options")
+        else None,
+        watermark_enabled=bool(wm.get("enabled", True)),
+        watermark_config=wm_full,
+    )
+
+
+def resolve_watermark_settings(
+    config_path: Path,
+    *,
+    threshold: Optional[int] = None,
+    morph_close_kernel: Optional[int] = None,
+    dpi: Optional[int] = None,
+    no_contrast: bool = False,
+    text_black_target: Optional[int] = None,
+    method: Optional[str] = None,
+) -> WatermarkToolSettings:
+    """加载配置并应用命令行覆盖。"""
+    settings = load_watermark_settings(config_path)
+
+    if threshold is not None:
+        settings.threshold = threshold
+    if morph_close_kernel is not None:
+        settings.morph_close_kernel = morph_close_kernel
+    if dpi is not None:
+        settings.dpi = dpi
+    if method is not None:
+        settings.method = method
+        if settings.watermark_config is not None:
+            settings.watermark_config["method"] = method
+
+    if no_contrast and settings.contrast_enhancement:
+        settings.contrast_enhancement = copy.deepcopy(settings.contrast_enhancement)
+        settings.contrast_enhancement["enabled"] = False
+    elif text_black_target is not None:
+        if not settings.contrast_enhancement:
+            settings.contrast_enhancement = {"enabled": True, "method": "text_restore"}
+        else:
+            settings.contrast_enhancement = copy.deepcopy(settings.contrast_enhancement)
+        settings.contrast_enhancement["enabled"] = True
+        settings.contrast_enhancement["text_black_target"] = text_black_target
+
+    return settings
+
+
+def _watermark_removal_cfg_for_method(
+    settings: WatermarkToolSettings,
+    method: str,
+) -> Dict[str, Any]:
+    """构造指定 method 的 watermark_removal 配置副本。"""
+    cfg = copy.deepcopy(settings.watermark_config or {})
+    cfg["method"] = method
+    cfg["threshold"] = settings.threshold
+    cfg["morph_close_kernel"] = settings.morph_close_kernel
+    return cfg
+
+
+def _apply_image_watermark_removal(
+    img_np: np.ndarray,
+    *,
+    settings: WatermarkToolSettings,
+    contrast_enhancement: Optional[Dict[str, Any]] = None,
+    apply_watermark_removal: bool = True,
+    removal_debug: Optional[Dict[str, Any]] = None,
+) -> np.ndarray:
+    """与 universal_doc_parser 一致的 RGB 去水印 + 可选对比度增强。"""
+    wm_cfg = _watermark_removal_cfg_for_method(settings, settings.method)
+    return np.asarray(
+        remove_watermark_from_image_rgb(
+            img_np,
+            threshold=settings.threshold,
+            morph_close_kernel=settings.morph_close_kernel,
+            contrast_enhancement=contrast_enhancement,
+            apply_watermark_removal=apply_watermark_removal,
+            watermark_removal_cfg=wm_cfg,
+            removal_debug=removal_debug,
+            return_pil=False,
+        )
+    )
+
+
+def _active_contrast_enhancement(
+    settings: WatermarkToolSettings,
+) -> Optional[Dict[str, Any]]:
+    ce = settings.contrast_enhancement
+    if not ce or not ce.get("enabled", False):
+        return None
+    return ce
+
+
+def _maybe_save_watermark_debug(
+    before: np.ndarray,
+    after: np.ndarray,
+    debug_output_dir: Path,
+    page_name: str,
+    *,
+    settings: WatermarkToolSettings,
+    contrast_enhancement: Optional[Dict[str, Any]] = None,
+    removal_debug: Optional[Dict[str, Any]] = None,
+) -> None:
+    """保存调试图到 debug/watermark_removal/(与 pipeline 相同布局)。"""
+    params: Dict[str, Any] = {
+        "method": settings.method,
+        "threshold": settings.threshold,
+        "morph_close_kernel": settings.morph_close_kernel,
+    }
+    if contrast_enhancement:
+        params["contrast_enhancement"] = contrast_enhancement
+    if removal_debug:
+        for key in ("mode", "T_wm", "T_protect", "wm_mask_ratio", "white_pixel_ratio"):
+            if key in removal_debug:
+                params[key] = removal_debug[key]
+
+    mask_overlay = None
+    if removal_debug and "wm_mask" in removal_debug:
+        mask_overlay = render_watermark_mask_overlay(
+            before, removal_debug["wm_mask"]
+        )
+
+    save_watermark_removal_debug(
+        before,
+        after,
+        debug_output_dir,
+        page_name,
+        processing_params=params,
+        image_format=settings.debug_image_format,
+        save_compare=True,
+        mask_overlay=mask_overlay,
+    )
+
 
 def _try_remove_txt_pdf_watermark(input_path: Path, output_path: Path) -> int:
     """
@@ -81,11 +275,12 @@ def _try_remove_txt_pdf_watermark(input_path: Path, output_path: Path) -> int:
 def process_document(
     input_path: Path,
     output_path: Path,
-    threshold: int = 160,
-    morph_close_kernel: int = 0,
-    dpi: int = 200,
+    settings: WatermarkToolSettings,
     page_range: Optional[str] = None,
     force_image: bool = False,
+    save_debug: bool = False,
+    debug_output_dir: Optional[Path] = None,
+    apply_watermark_removal: Optional[bool] = None,
 ) -> int:
     """
     统一处理函数:支持 PDF(扫描件)和图片,去除水印后保存。
@@ -99,23 +294,27 @@ def process_document(
     Args:
         input_path: 输入文件路径(PDF 或图片)
         output_path: 输出文件路径
-        threshold: 灰度阈值(140-180),越大保守,越小激进
-        morph_close_kernel: 形态学闭运算核大小,0 跳过
-        dpi: PDF 渲染分辨率
+        settings: 水印配置(含 method / threshold / mask / adaptive)
         page_range: 页面范围字符串,如 "1-5,7,9-12"(从 1 开始,仅对 PDF 有效)
         force_image: 强制对文字型 PDF 使用图像化处理(会失去文字可搜索性,
                      但能处理水印嵌在内容流中的情况)
+        save_debug: 是否保存 before/after/compare/meta 到 debug/watermark_removal/
+        debug_output_dir: 调试图根目录,默认 output_path 的父目录
+        apply_watermark_removal: 默认取 settings.watermark_enabled
 
     Returns:
         实际处理的页/图片数
     """
     import shutil
-    import numpy as np
     from io import BytesIO
     from PIL import Image
     from ocr_utils.pdf_utils import PDFUtils
 
     is_pdf = input_path.suffix.lower() == ".pdf"
+    dpi = settings.dpi
+    contrast_enhancement = _active_contrast_enhancement(settings)
+    if apply_watermark_removal is None:
+        apply_watermark_removal = settings.watermark_enabled
 
     # 统一加载 + 分类(PDF 用 MinerU pdf_classify,图片直接读取)
     images, pdf_type, pdf_doc, renderer = PDFUtils.load_and_classify_document(
@@ -161,12 +360,22 @@ def process_document(
 
     logger.info(
         f"{'📄' if is_pdf else '🖼️ '} 处理: {input_path.name}  "
-        f"共 {len(images)} {'页' if is_pdf else '张'}  threshold={threshold}"
+        f"共 {len(images)} {'页' if is_pdf else '张'}  "
+        f"method={settings.method} threshold={settings.threshold}"
+    )
+
+    contrast_only = (
+        not apply_watermark_removal
+        and contrast_enhancement
+        and contrast_enhancement.get("enabled", False)
     )
 
     # 水印检测(仅用第一页/图判断,同一文档水印通常一致)
     # _known_has_wm 已在 txt 分支设置时,跳过重复检测
-    if _known_has_wm is not None:
+    if contrast_only:
+        has_wm = True
+        logger.info("📋 配置关闭去水印,仅应用 contrast_enhancement")
+    elif _known_has_wm is not None:
         has_wm = _known_has_wm
         logger.info("🔍 检测到水印,启动去水印处理" if has_wm else "✅ 未检测到水印,跳过")
     else:
@@ -185,6 +394,7 @@ def process_document(
                 return 1
 
     output_path.parent.mkdir(parents=True, exist_ok=True)
+    debug_root = debug_output_dir or output_path.parent
 
     if is_pdf:
         # 逐页处理后重新打包为 PDF
@@ -197,13 +407,31 @@ def process_document(
         for i, img_dict in enumerate(images):
             pil_img = img_dict["img_pil"]
             img_np = np.array(pil_img)
+            page_name = f"{input_path.stem}_page_{i + 1:03d}"
 
             if has_wm:
-                cleaned_gray = remove_watermark_from_image(
-                    img_np, threshold=threshold,
-                    morph_close_kernel=morph_close_kernel, return_pil=False,
+                before = img_np.copy()
+                removal_dbg: Dict[str, Any] = {}
+                cleaned_rgb = _apply_image_watermark_removal(
+                    img_np,
+                    settings=settings,
+                    contrast_enhancement=contrast_enhancement,
+                    apply_watermark_removal=apply_watermark_removal,
+                    removal_debug=removal_dbg,
+                )
+                if save_debug:
+                    _maybe_save_watermark_debug(
+                        before,
+                        cleaned_rgb,
+                        debug_root,
+                        page_name,
+                        settings=settings,
+                        contrast_enhancement=contrast_enhancement,
+                        removal_debug=removal_dbg,
+                    )
+                out_pil = Image.fromarray(
+                    cv2.cvtColor(cleaned_rgb, cv2.COLOR_BGR2RGB)
                 )
-                out_pil = Image.fromarray(cleaned_gray).convert("RGB")
             else:
                 out_pil = pil_img
 
@@ -223,11 +451,27 @@ def process_document(
     else:
         # 图片:有水印则去除后保存
         img_np = np.array(images[0]["img_pil"])
-        cleaned_gray = remove_watermark_from_image(
-            img_np, threshold=threshold,
-            morph_close_kernel=morph_close_kernel, return_pil=False,
+        before = img_np.copy()
+        removal_dbg = {}
+        cleaned_rgb = _apply_image_watermark_removal(
+            img_np,
+            settings=settings,
+            contrast_enhancement=contrast_enhancement,
+            apply_watermark_removal=apply_watermark_removal,
+            removal_debug=removal_dbg,
         )
-        Image.fromarray(cleaned_gray, mode="L").save(str(output_path))
+        if save_debug:
+            _maybe_save_watermark_debug(
+                before,
+                cleaned_rgb,
+                debug_root,
+                input_path.stem,
+                settings=settings,
+                contrast_enhancement=contrast_enhancement,
+                removal_debug=removal_dbg,
+            )
+        out_rgb = cv2.cvtColor(cleaned_rgb, cv2.COLOR_BGR2RGB)
+        Image.fromarray(out_rgb).save(str(output_path))
 
     logger.info(f"✅ 保存到: {output_path}")
     return len(images)
@@ -235,13 +479,11 @@ def process_document(
 
 def preview_page(
     input_path: Path,
+    settings: WatermarkToolSettings,
     page_idx: int = 0,
-    threshold: int = 160,
-    dpi: int = 200,
 ):
     """展示单页原图与去水印对比(需要 matplotlib)。支持 PDF 和图片文件。"""
     try:
-        import numpy as np
         import matplotlib.pyplot as plt
         import matplotlib
         matplotlib.rcParams['font.sans-serif'] = ['Arial Unicode MS', 'SimHei', 'sans-serif']
@@ -259,7 +501,7 @@ def preview_page(
         doc = fitz.open(str(input_path))
         if page_idx >= len(doc):
             raise ValueError(f"页码 {page_idx} 超出范围(共 {len(doc)} 页)")
-        mat = fitz.Matrix(dpi / 72, dpi / 72)
+        mat = fitz.Matrix(settings.dpi / 72, settings.dpi / 72)
         page = doc[page_idx]
         pix = page.get_pixmap(matrix=mat, alpha=False)
         img_np = np.frombuffer(pix.samples, dtype=np.uint8).reshape(pix.h, pix.w, 3)
@@ -271,56 +513,261 @@ def preview_page(
     else:
         raise ValueError(f"不支持的文件格式: {suffix}")
 
-    cleaned = remove_watermark_from_image(img_np, threshold=threshold, return_pil=False)
+    contrast = _active_contrast_enhancement(settings)
+    cleaned_rgb = _apply_image_watermark_removal(
+        img_np,
+        settings=settings,
+        contrast_enhancement=contrast,
+        apply_watermark_removal=settings.watermark_enabled,
+    )
+    cleaned = cv2.cvtColor(cleaned_rgb, cv2.COLOR_BGR2GRAY)
 
     fig, axes = plt.subplots(1, 2, figsize=(20, 14))
     axes[0].imshow(img_np)
     axes[0].set_title(title_orig, fontsize=14)
     axes[0].axis('off')
 
+    subtitle = f"method={settings.method}, threshold={settings.threshold}"
+    if contrast:
+        subtitle += f", contrast={contrast.get('method', 'on')}"
     axes[1].imshow(cleaned, cmap='gray')
-    axes[1].set_title(f"去水印后  threshold={threshold}", fontsize=14)
+    axes[1].set_title(f"去水印后  {subtitle}", fontsize=14)
     axes[1].axis('off')
 
     plt.tight_layout()
     plt.show()
 
 
+def _run_process_document(
+    input_path: Path,
+    output_path: Path,
+    settings: WatermarkToolSettings,
+    *,
+    page_range: Optional[str] = None,
+    force_image: bool = False,
+    save_debug: bool = False,
+    debug_output_dir: Optional[Path] = None,
+) -> int:
+    return process_document(
+        input_path,
+        output_path,
+        settings,
+        page_range=page_range,
+        force_image=force_image,
+        save_debug=save_debug,
+        debug_output_dir=debug_output_dir,
+    )
+
+
+def compare_watermark_methods(
+    input_path: Path,
+    output_dir: Path,
+    settings: WatermarkToolSettings,
+) -> Dict[str, str]:
+    """
+    同一张图对比 threshold 与 masked_adaptive,输出三联图与 meta。
+
+    Returns:
+        各输出文件路径
+    """
+    from PIL import Image
+
+    output_dir.mkdir(parents=True, exist_ok=True)
+    stem = input_path.stem
+    img_rgb = np.array(Image.open(str(input_path)).convert("RGB"))
+    contrast = _active_contrast_enhancement(settings)
+
+    paths: Dict[str, str] = {}
+    results: Dict[str, np.ndarray] = {}
+
+    for method in ("threshold", "masked_adaptive"):
+        sub = copy.deepcopy(settings)
+        sub.method = method
+        dbg: Dict[str, Any] = {}
+        out = _apply_image_watermark_removal(
+            img_rgb,
+            settings=sub,
+            contrast_enhancement=contrast,
+            removal_debug=dbg,
+        )
+        out_rgb = cv2.cvtColor(out, cv2.COLOR_BGR2RGB)
+        results[method] = out_rgb
+        out_path = output_dir / f"{stem}_cleaned_{method}.png"
+        Image.fromarray(out_rgb).save(str(out_path))
+        paths[method] = str(out_path)
+        meta_path = output_dir / f"{stem}_meta_{method}.json"
+        meta = {
+            "method": method,
+            "threshold": settings.threshold,
+            "mask_mode": dbg.get("mask_mode"),
+            "direction_filter": dbg.get("direction_filter"),
+            "whiten_mode": dbg.get("whiten_mode"),
+            "T_wm": dbg.get("T_wm"),
+            "T_protect": dbg.get("T_protect"),
+            "mode": dbg.get("mode"),
+            "midtone_ratio": dbg.get("midtone_ratio"),
+            "wm_candidate_ratio": dbg.get("wm_candidate_ratio"),
+            "geom_mask_ratio": dbg.get("geom_mask_ratio"),
+            "geom_candidate_ratio": dbg.get("geom_candidate_ratio"),
+            "wm_mask_ratio": dbg.get("wm_mask_ratio"),
+            "white_pixel_ratio": dbg.get("white_pixel_ratio"),
+            "hough_kept_lines": dbg.get("hough_kept_lines"),
+            "hough_diag_candidates": dbg.get("hough_diag_candidates"),
+            "hough_total_lines": dbg.get("hough_total_lines"),
+            "dominant_angles": dbg.get("dominant_angles"),
+            "whiten_gray_low": dbg.get("whiten_gray_low"),
+        }
+        meta_path.write_text(
+            json.dumps(meta, ensure_ascii=False, indent=2), encoding="utf-8"
+        )
+        paths[f"meta_{method}"] = str(meta_path)
+        if method == "masked_adaptive":
+            layer_paths = save_watermark_mask_debug_layers(
+                img_rgb, output_dir, stem, dbg, image_format="png"
+            )
+            paths.update(layer_paths)
+
+    h = max(results["threshold"].shape[0], results["masked_adaptive"].shape[0])
+
+    def _resize_rgb(arr: np.ndarray) -> np.ndarray:
+        if arr.shape[0] == h:
+            return arr
+        scale = h / arr.shape[0]
+        w = int(arr.shape[1] * scale)
+        return cv2.resize(arr, (w, h))
+
+    triple = np.hstack(
+        [_resize_rgb(img_rgb)]
+        + [_resize_rgb(results[m]) for m in ("threshold", "masked_adaptive")]
+    )
+    compare_path = output_dir / f"{stem}_compare_orig_threshold_masked.png"
+    cv2.imwrite(
+        str(compare_path),
+        cv2.cvtColor(triple, cv2.COLOR_RGB2BGR),
+    )
+    paths["compare_triple"] = str(compare_path)
+    logger.info(f"✅ 方法对比已保存: {compare_path}")
+    return paths
+
+
 def main():
     parser = argparse.ArgumentParser(
-        description="银行流水水印去除工具",
+        description="银行流水水印去除工具(参数默认来自场景 YAML,与 main_v2 Pipeline 一致)",
         formatter_class=argparse.RawDescriptionHelpFormatter,
         epilog=__doc__,
     )
     parser.add_argument("input", type=Path, help="输入 PDF / 图片文件或目录(批量模式)")
-    parser.add_argument("-o", "--output", type=Path, default=None,
-                        help="输出路径(单文件模式;默认在原文件名后加 _cleaned)")
-    parser.add_argument("--threshold", type=int, default=160,
-                        help="灰度阈值 (140-180),默认 160")
-    parser.add_argument("--morph-kernel", type=int, default=2,
-                        help="形态学闭运算核大小,0 跳过,默认 2")
-    parser.add_argument("--dpi", type=int, default=200,
-                        help="渲染 DPI,默认 200")
-    parser.add_argument("--batch", action="store_true",
-                        help="批量模式:处理目录下所有 PDF 和图片")
-    parser.add_argument("--preview", action="store_true",
-                        help="预览模式:展示单页对比图(不保存)")
-    parser.add_argument("--page", type=int, default=0,
-                        help="预览页码(0-based),默认第 0 页")
-    parser.add_argument("--page-range", type=str, default=None,
-                        help="处理页面范围,如 '1-3,5,7-9'(从 1 开始,仅对 PDF 有效)")
-    parser.add_argument("--force-image", action="store_true",
-                        help="强制对文字型 PDF 使用图像化处理(会失去可搜索性,适用于 XObject 方法无法去除的内联水印)")
+    parser.add_argument(
+        "-c",
+        "--config",
+        type=Path,
+        default=_DEFAULT_CONFIG_PATH,
+        help=f"场景配置文件,读取 preprocessor.watermark_removal(默认: {_DEFAULT_CONFIG_PATH.name})",
+    )
+    parser.add_argument(
+        "-o",
+        "--output",
+        type=Path,
+        default=None,
+        help="输出路径(单文件模式;默认在原文件名后加 _cleaned)",
+    )
+    parser.add_argument("--batch", action="store_true", help="批量处理目录下所有 PDF 和图片")
+    parser.add_argument("--preview", action="store_true", help="预览模式:展示单页对比图(不保存)")
+    parser.add_argument("--page", type=int, default=0, help="预览页码(0-based)")
+    parser.add_argument(
+        "--page-range",
+        type=str,
+        default=None,
+        help="PDF 页面范围,如 '1-3,5,7-9'(从 1 开始)",
+    )
+    parser.add_argument(
+        "--force-image",
+        action="store_true",
+        help="文字型 PDF 强制走图像去水印(失去可搜索性)",
+    )
+    parser.add_argument(
+        "--debug",
+        action="store_true",
+        help="保存调试图到 debug/watermark_removal/",
+    )
+    parser.add_argument(
+        "--debug-dir",
+        type=Path,
+        default=None,
+        help="调试图根目录(默认 -o 的父目录;格式见配置文件 debug_options.image_format)",
+    )
+    # 以下为覆盖配置文件的少量旋钮(未指定则完全使用 YAML)
+    override = parser.add_argument_group("覆盖配置文件(可选)")
+    override.add_argument(
+        "--threshold",
+        type=int,
+        default=None,
+        help="覆盖 watermark_removal.threshold(140-180)",
+    )
+    override.add_argument(
+        "--morph-kernel",
+        type=int,
+        default=None,
+        help="覆盖 watermark_removal.morph_close_kernel",
+    )
+    override.add_argument("--dpi", type=int, default=None, help="覆盖 input.dpi")
+    override.add_argument("--no-contrast", action="store_true", help="关闭 contrast_enhancement")
+    override.add_argument(
+        "--text-black-target",
+        type=int,
+        default=None,
+        help="覆盖 contrast_enhancement.text_black_target(text_restore)",
+    )
+    override.add_argument(
+        "--method",
+        type=str,
+        default=None,
+        choices=["threshold", "masked", "masked_adaptive"],
+        help="覆盖 watermark_removal.method",
+    )
+    parser.add_argument(
+        "--compare-methods",
+        action="store_true",
+        help="对比 threshold 与 masked_adaptive,输出三联图到 -o 目录",
+    )
 
     args = parser.parse_args()
 
-    if args.preview:
-        preview_page(
-            args.input,
-            page_idx=args.page,
+    try:
+        settings = resolve_watermark_settings(
+            args.config,
             threshold=args.threshold,
+            morph_close_kernel=args.morph_kernel,
             dpi=args.dpi,
+            no_contrast=args.no_contrast,
+            text_black_target=args.text_black_target,
+            method=args.method,
         )
+    except FileNotFoundError as e:
+        logger.error(str(e))
+        sys.exit(1)
+
+    logger.info(
+        f"📋 配置: {args.config} | method={settings.method} | "
+        f"threshold={settings.threshold} | morph_kernel={settings.morph_close_kernel} | "
+        f"dpi={settings.dpi} | contrast={settings.contrast_enhancement}"
+    )
+
+    if args.compare_methods:
+        input_path = args.input
+        if not input_path.is_file():
+            logger.error(f"文件不存在: {input_path}")
+            sys.exit(1)
+        out_dir = args.output or (
+            input_path.parent / "debug" / "watermark_method_compare"
+        )
+        paths = compare_watermark_methods(input_path, out_dir, settings)
+        for k, v in paths.items():
+            logger.info(f"  {k}: {v}")
+        return
+
+    if args.preview:
+        preview_page(args.input, settings, page_idx=args.page)
         return
 
     if args.batch:
@@ -345,7 +792,15 @@ def main():
         for file in all_files:
             out_file = out_dir / f"{file.stem}_cleaned{file.suffix}"
             try:
-                process_document(file, out_file, args.threshold, args.morph_kernel, args.dpi, args.page_range, args.force_image)
+                _run_process_document(
+                    file,
+                    out_file,
+                    settings,
+                    page_range=args.page_range,
+                    force_image=args.force_image,
+                    save_debug=args.debug,
+                    debug_output_dir=args.debug_dir or out_dir,
+                )
             except Exception as e:
                 logger.error(f"❌ 处理失败 {file.name}: {e}")
         logger.info(f"✅ 批量处理完成,共 {len(all_files)} 个文件 -> {out_dir}")
@@ -360,7 +815,15 @@ def main():
         )
         suffix = input_path.suffix.lower()
         if suffix == ".pdf" or suffix in IMAGE_SUFFIXES:
-            process_document(input_path, output_path, args.threshold, args.morph_kernel, args.dpi, args.page_range, args.force_image)
+            _run_process_document(
+                input_path,
+                output_path,
+                settings,
+                page_range=args.page_range,
+                force_image=args.force_image,
+                save_debug=args.debug,
+                debug_output_dir=args.debug_dir or output_path.parent,
+            )
         else:
             logger.error(f"不支持的文件格式: {suffix},支持 PDF 和 {IMAGE_SUFFIXES}")
             sys.exit(1)
@@ -379,14 +842,15 @@ if __name__ == "__main__":
             # 文字PDF测试
             # "input": "/Users/zhch158/workspace/data/流水分析/提取自赤峰黄金2023年报.pdf",
             # "input": "/Users/zhch158/workspace/data/测试文字PDF-水印.pdf",
-            "input": "/Users/zhch158/workspace/data/非结构化文档识别统一平台(ocr_platform)-交易流水识别,财报识别.pdf",
+            # "input": "/Users/zhch158/workspace/data/非结构化文档识别统一平台(ocr_platform)-交易流水识别,财报识别.pdf",
+            "input": "/Users/zhch158/workspace/data/流水分析/彭_广东兴宁农村商业银行/bank_statement_yusys_local/彭_广东兴宁农村商业银行/彭_广东兴宁农村商业银行_page_002.png",
             # "output": "./output/杨万益_福建农信",
             # 页面范围(可选,支持 "1-5,7" 语法,仅对 PDF 有效)
             # "page_range": "3",  # 仅处理第 1 页(对应 --page-range 参数)
-            "dpi": 200,
-            "threshold": 160,
-            "morph_kernel": 0,  # 遮罩替换模式下不需要闭运算
-            # "preview": True,
+            "config": str(_DEFAULT_CONFIG_PATH),
+            "preview": True,
+            "debug": True,
+            "compare-methods": True,
         }
 
         # 构造参数(注意 input 是位置参数,morph_kernel 对应 --morph-kernel)

+ 45 - 8
ocr_tools/universal_doc_parser/config/bank_statement_glm_vl.yaml

@@ -11,6 +11,8 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
     model_name: "paddle_orientation_classification"
@@ -21,10 +23,30 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 # ============================================================
 # Layout 检测配置 - 使用 PP-DocLayoutV3
@@ -46,11 +68,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: true               # 是否开启调试可视化输出
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: true              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # VL识别配置 - 使用 GLM-OCR
@@ -103,6 +130,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 输出配置
 # ============================================================

+ 61 - 21
ocr_tools/universal_doc_parser/config/bank_statement_glm_vl_local.yaml

@@ -14,6 +14,8 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
     model_name: "paddle_orientation_classification"
@@ -24,10 +26,30 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 # ============================================================
 # Layout 检测配置 - 智能路由器(按场景直接选择模型)
@@ -71,11 +93,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: false              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # OCR 识别配置
@@ -89,6 +116,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 表格分类配置(自动区分有线/无线表格)
 # ============================================================
@@ -100,11 +137,12 @@ table_classification:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_classification  # 输出至 debug/table_classification/
+    save_table_lines: true      # paddle 线条检测叠加图
+    image_format: "png"
 
 # ============================================================
 # 有线表格识别专用配置(MinerU UNet)
@@ -144,14 +182,16 @@ table_recognition_wired:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    save_connected_components: true  # 保存连通域提取的单元格图
-    save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
-    save_text_overlay: true      # 保存文本填充覆盖图
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_recognition_wired  # 输出至 debug/table_recognition_wired/
+    save_table_lines: true
+    save_connected_components: true
+    save_grid_structure: true
+    save_text_overlay: true
+    image_format: "png"
+    # 单元格二次 OCR 裁剪图:debug/table_recognition_wired/tablecell_ocr/
 
 # ============================================================
 # VL识别配置 - 使用 GLM-OCR(无线表格 + seal识别)

+ 46 - 10
ocr_tools/universal_doc_parser/config/bank_statement_mineru_vl.yaml

@@ -11,8 +11,10 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
-    enabled: true  # 扫描件自动开启,数字PDF自动跳过
+    enabled: true
     model_name: "paddle_orientation_classification"
     model_dir: null  # 使用默认路径
   unwarping:
@@ -21,10 +23,30 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 layout_detection:
   # MinerU-VL layout(通过 VLM 服务做版式检测)
@@ -43,12 +65,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置(与 MinerUWiredTableRecognizer.DebugOptions 对齐)
-  # 默认关闭。开启后将保存:layout检测结果
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: true               # 是否开启调试可视化输出
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: true              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # VL识别配置(表格、公式)
@@ -78,6 +104,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 输出配置
 # ============================================================

+ 45 - 9
ocr_tools/universal_doc_parser/config/bank_statement_paddle_vl.yaml

@@ -11,6 +11,8 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
     model_name: "paddle_orientation_classification"
@@ -21,10 +23,30 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 layout_detection:
   # module: "paddle"
@@ -48,12 +70,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置(与 MinerUWiredTableRecognizer.DebugOptions 对齐)
-  # 默认关闭。开启后将保存:layout检测结果
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: true               # 是否开启调试可视化输出
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: true              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # VL识别配置(表格、公式)
@@ -84,6 +110,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 输出配置
 # ============================================================

+ 61 - 21
ocr_tools/universal_doc_parser/config/bank_statement_paddle_vl_local.yaml

@@ -14,6 +14,8 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
     model_name: "paddle_orientation_classification"
@@ -24,10 +26,30 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 # ============================================================
 # Layout 检测配置 - 智能路由器(按场景直接选择模型)
@@ -71,11 +93,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: false              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # OCR 识别配置
@@ -89,6 +116,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 表格分类配置(自动区分有线/无线表格)
 # ============================================================
@@ -100,11 +137,12 @@ table_classification:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_classification  # 输出至 debug/table_classification/
+    save_table_lines: true      # paddle 线条检测叠加图
+    image_format: "png"
 
 # ============================================================
 # 有线表格识别专用配置(MinerU UNet)
@@ -144,14 +182,16 @@ table_recognition_wired:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    save_connected_components: true  # 保存连通域提取的单元格图
-    save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
-    save_text_overlay: true      # 保存文本填充覆盖图
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_recognition_wired  # 输出至 debug/table_recognition_wired/
+    save_table_lines: true
+    save_connected_components: true
+    save_grid_structure: true
+    save_text_overlay: true
+    image_format: "png"
+    # 单元格二次 OCR 裁剪图:debug/table_recognition_wired/tablecell_ocr/
 
 # ============================================================
 # VL识别配置 - 使用 PaddleOcr-VL(无线表格 + seal识别)

+ 59 - 25
ocr_tools/universal_doc_parser/config/bank_statement_smart_router.yaml

@@ -13,20 +13,43 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
+    model_name: "paddle_orientation_classification"
+    model_dir: null  # 使用默认路径
+  unwarping:
+    enabled: false
   # -------------------------------------------------------
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
-
-# ============================================================
-# 智能布局模型路由器配置
-# ============================================================
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
+
 layout_detection:
   module: "smart_router"
   strategy: "ocr_eval"  # ocr_eval(推荐,基于OCR评估选择最佳), auto(快速模式,基于文档特征)
@@ -81,6 +104,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: true              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # 表格分类配置(自动区分有线/无线表格)
 table_classification:
   enabled: true               # 是否启用自动表格分类(默认关闭,使用手动配置)
@@ -88,14 +121,18 @@ table_classification:
   confidence_threshold: 0.5   # 分类置信度阈值
   batch_size: 16              # 批处理大小
 
-  # Debug 可视化配置(与 MinerUWiredTableRecognizer.DebugOptions 对齐)
-  # 默认关闭。开启后将保存:表格线
+
+
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: true               # 是否开启调试可视化输出
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: true              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # 有线表格识别专用配置
 table_recognition_wired:
@@ -111,17 +148,14 @@ table_recognition_wired:
   # 是否启用倾斜矫正
   enable_deskew: true
 
-  # Debug 可视化配置(与 MinerUWiredTableRecognizer.DebugOptions 对齐)
-  # 默认关闭。开启后将保存:表格线、连通域、逻辑网格结构、文本覆盖可视化。
+  # Debug 可视化配置
   debug_options:
-    enabled: true               # 是否开启调试可视化输出
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    save_connected_components: true  # 保存连通域提取的单元格图
-    save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
-    save_text_overlay: true      # 保存文本填充覆盖图
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: true              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_classification  # 输出至 debug/table_classification/
+    save_table_lines: true      # paddle 线条检测叠加图
+    image_format: "png"
 
 # VLM 表格识别配置(当分类为 'wireless' 时使用)
 vl_recognition:

+ 105 - 21
ocr_tools/universal_doc_parser/config/bank_statement_yusys_local.yaml

@@ -14,6 +14,8 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
     model_name: "paddle_orientation_classification"
@@ -24,10 +26,75 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: masked_adaptive # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    mask:
+      mask_mode: light_on_white     # light_on_white | diagonal_midtone
+      text_protect_gray_max: 130    # gray<=130 正文硬保护,永不置白
+      light_gray_low: 236           # 浅色候选(geom_candidate 用)
+      light_gray_high: 253
+      whiten_gray_low: 200          # 几何带内置白灰度下限(方案 E,低于 candidate)
+      direction_filter: hough       # hough=方案C斜向线段 | block=旧分块梯度
+      morph_close_kernel: 0
+      morph_dilate_kernel: 0
+      min_component_area: 200
+      debug_block_maps: true        # 输出 diag/hv 热力图
+      debug_block_size: 48
+      hough_midtone_low: 200        # Canny 仅在中间调带
+      hough_midtone_high: 254
+      hough_canny_low: 30
+      hough_canny_high: 100
+      hough_threshold: 25
+      hough_min_line_length: 35
+      hough_max_line_gap: 18
+      hough_line_thickness: 12
+      hough_band_dilate_radius: 16
+      hough_use_angle_statistics: true   # 角度直方图统计主峰
+      hough_angle_tolerance: 5.0       # 与主峰角度差≤该值(度)
+      hough_secondary_peak_ratio: 0.35 # 次峰相对主峰权重
+      hough_min_length_percentile: 25.0  # 过滤短线段
+      midtone_low: 95
+      midtone_high: 235           # diagonal_midtone 模式用
+      remove_horizontal_vertical: true
+      diagonal_enhance: true
+      diagonal_kernel_length: 25
+      horizontal_kernel_length: 35
+      vertical_kernel_length: 35
+      morph_open_kernel: 2
+      dmorph_close_kernel: 3
+      text_protect_percentile: 10.0
+      background_threshold: 248
+      seal_protect: true
+    adaptive:
+      whiten_mode: mask_fill       # mask_fill=掩膜内一律置白 | threshold_in_mask
+      text_percentile: 10.0
+      watermark_percentile: 70.0   # threshold_in_mask 时生效
+      background_percentile: 95.0
+      background_threshold: 248
+      wm_margin: 12
+      text_protect_max: 120
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 # ============================================================
 # Layout 检测配置 - 智能路由器(按场景直接选择模型)
@@ -71,11 +138,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: false              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # OCR 识别配置
@@ -89,6 +161,15 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 表格分类配置(自动区分有线/无线表格)
 # ============================================================
@@ -100,11 +181,12 @@ table_classification:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_classification  # 输出至 debug/table_classification/
+    save_table_lines: true      # paddle 线条检测叠加图
+    image_format: "png"
 
 # ============================================================
 # 有线表格识别专用配置(MinerU UNet)
@@ -144,14 +226,16 @@ table_recognition_wired:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    save_connected_components: true  # 保存连通域提取的单元格图
-    save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
-    save_text_overlay: true      # 保存文本填充覆盖图
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_recognition_wired  # 输出至 debug/table_recognition_wired/
+    save_table_lines: true
+    save_connected_components: true
+    save_grid_structure: true
+    save_text_overlay: true
+    image_format: "png"
+    # 单元格二次 OCR 裁剪图:debug/table_recognition_wired/tablecell_ocr/
 
 # ============================================================
 # VL识别配置 - 使用 GLM-OCR(无线表格 + seal识别)

+ 61 - 21
ocr_tools/universal_doc_parser/config/bank_statement_yusys_v4.yaml

@@ -13,6 +13,8 @@ input:
 
 preprocessor:
   module: "mineru"
+  # 页级预处理顺序:orient_first=先扶正再去水印(银行斜纹水印推荐);watermark_first=兼容旧行为
+  order: orient_first
   orientation_classifier:
     enabled: true
     model_name: "paddle_orientation_classification"
@@ -23,10 +25,30 @@ preprocessor:
   # 水印去除配置(适用于银行流水浅色斜向文字水印)
   # -------------------------------------------------------
   watermark_removal:
-    enabled: true           # 是否启用水印去除
-    threshold: 160          # 灰度阈值(140-180):高于此值视为水印变白
-                            # 值越大保守(残留水印),值越小激进(损失浅色正文)
-    morph_close_kernel: 0   # 形态学闭运算核大小(像素),默认的 morph_kernel 改为 0(非二值图像时形态学闭运算会适得其反)
+    enabled: false           # 是否启用水印去除
+    method: threshold # threshold | masked | masked_adaptive
+    threshold: 175          # 全局阈值或掩膜失败时的回退阈值(140-180)
+    morph_close_kernel: 0   # 去水印后灰度图闭运算,0 跳过
+    # 去水印后对比度增强(text_restore 将笔画拉深,比全局 gamma 更接近原图)
+    contrast_enhancement:
+      enabled: true
+      method: text_restore   # text_restore | clahe | gamma | linear
+      text_black_target: 85  # 略提高,减轻去水印后笔画被拉花(原 75 过深)
+      background_threshold: 248
+      text_lo_percentile: 1.0
+      text_hi_percentile: 99.0
+      gamma: 0.75            # method=gamma 时生效
+      clip_limit: 2.0        # method=clahe
+      tile_grid_size: 8
+      black_percentile: 2.0  # method=linear
+      white_percentile: 98.0
+    debug_options:
+      enabled: false              # 由命令行 --debug / --debug-layout 统一控制
+      output_dir: null            # null 时使用 pipeline 输出目录
+      prefix: ""                  # 文件名前缀(运行时注入 page_name)
+      subdir: watermark_removal   # 输出至 debug/watermark_removal/
+      save_compare: true          # 保存左右对比图 *_watermark_compare.*
+      image_format: "png"         # jpg / png
 
 # ============================================================
 # Layout 检测配置 - 智能路由器(按场景直接选择模型)
@@ -70,11 +92,16 @@ layout_detection:
     min_text_width_ratio: 0.4         # 最小宽度占比(40%)
     min_text_height_ratio: 0.3        # 最小高度占比(30%)
 
-  # Debug 可视化配置
+  # Debug 可视化(底图为 inference_image,与 Layout 检测输入一致)
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    prefix: ""                  # 保存文件名前缀(如设置为页码)
+    enabled: false              # 由命令行 --debug / --debug-layout 控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: layout_detection    # 输出至 debug/layout_detection/
+    save_raw: true              # 后处理前
+    save_post_processed: true   # 后处理后
+    save_json: true
+    image_format: "png"
 
 # ============================================================
 # OCR 识别配置
@@ -88,6 +115,16 @@ ocr_recognition:
   batch_size: 8
   device: "cpu"
 
+
+  # Debug 可视化(底图为 inference_image,与整页 OCR 输入一致)
+  debug_options:
+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
+    output_dir: null
+    prefix: ""
+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
+    save_json: true
+    image_format: png
+
 # ============================================================
 # 表格分类配置(自动区分有线/无线表格)
 # ============================================================
@@ -99,11 +136,12 @@ table_classification:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_classification  # 输出至 debug/table_classification/
+    save_table_lines: true      # paddle 线条检测叠加图
+    image_format: "png"
 
 # ============================================================
 # 有线表格识别专用配置(MinerU UNet)
@@ -143,14 +181,16 @@ table_recognition_wired:
 
   # Debug 可视化配置
   debug_options:
-    enabled: false              # 由命令行 --debug 统一控制,勿在此 hardcode true
-    output_dir: null             # 调试输出目录;null不输出
-    save_table_lines: true       # 保存表格线可视化(unet横线/竖线叠加)
-    save_connected_components: true  # 保存连通域提取的单元格图
-    save_grid_structure: true    # 保存逻辑网格结构(row/col/rowspan/colspan)
-    save_text_overlay: true      # 保存文本填充覆盖图
-    image_format: "png"          # 可视化图片格式:png/jpg
-    prefix: ""                  # 保存文件名前缀(如设置为页码/表格序号)
+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
+    output_dir: null            # null 时由 pipeline 按页注入
+    prefix: ""
+    subdir: table_recognition_wired  # 输出至 debug/table_recognition_wired/
+    save_table_lines: true
+    save_connected_components: true
+    save_grid_structure: true
+    save_text_overlay: true
+    image_format: "png"
+    # 单元格二次 OCR 裁剪图:debug/table_recognition_wired/tablecell_ocr/
 
 # ============================================================
 # VL识别配置 - 使用 GLM-OCR(无线表格 + seal识别)

+ 10 - 8
ocr_tools/universal_doc_parser/core/element_processors.py

@@ -366,6 +366,7 @@ class ElementProcessors:
         basename: Optional[str] = None,
         normalize_numbers: bool = True,
         debug_mode: bool = False,
+        debug_options: Optional[Dict[str, Any]] = None,
     ) -> Dict[str, Any]:
         """
         使用 UNet 有线表格识别处理表格元素
@@ -399,20 +400,21 @@ class ElementProcessors:
             if not self.wired_table_recognizer:
                 raise RuntimeError("Wired table recognizer not available")
             
-            # 构造调试选项覆盖
-            debug_opts_override = {'enabled': debug_mode}
-            if output_dir:
-                debug_opts_override['output_dir'] = output_dir
-            if basename:
-                # 使用完整 basename 作为前缀 (如 "filename_page_001")
-                debug_opts_override['prefix'] = basename
+            if debug_options is not None:
+                debug_opts_override = dict(debug_options)
+            else:
+                debug_opts_override = {'enabled': debug_mode}
+                if output_dir:
+                    debug_opts_override['output_dir'] = output_dir
+                if basename:
+                    debug_opts_override['prefix'] = basename
 
             wired_res = self.wired_table_recognizer.recognize(
                 table_image=cropped_table,
                 # ocr_boxes=ocr_boxes_for_wired,
                 ocr_boxes=ocr_boxes,
                 pdf_type=pdf_type,
-                debug_options=debug_opts_override
+                debug_options=debug_opts_override,
             )
             
             if not (wired_res.get('html') or wired_res.get('cells')):

+ 44 - 30
ocr_tools/universal_doc_parser/core/layout_model_router.py

@@ -48,18 +48,27 @@ class SmartLayoutRouter(BaseLayoutDetector):
         
     def initialize(self):
         """初始化所有模型"""
-        # 获取 post_process 配置(从父配置中)
+        # 获取 post_process / debug_options 配置(从父 smart_router 配置中)
         post_process_config = self.config.get('post_process', {})
-        
+        layout_debug_options = self.config.get('debug_options', {})
+        if not isinstance(layout_debug_options, dict):
+            layout_debug_options = {}
+
+        def _merge_child_model_config(child_cfg: Dict[str, Any]) -> Dict[str, Any]:
+            merged = child_cfg.copy()
+            if post_process_config:
+                merged['post_process'] = post_process_config
+            if layout_debug_options:
+                merged['debug_options'] = layout_debug_options.copy()
+            return merged
+
         # 初始化主模型
         for model_name, model_config in self.model_configs.items():
             try:
                 logger.info(f"🔧 Initializing layout model: {model_name}")
-                # 将 post_process 配置添加到子模型配置中
-                if post_process_config:
-                    model_config = model_config.copy()
-                    model_config['post_process'] = post_process_config
-                detector = ModelFactory.create_layout_detector(model_config)
+                detector = ModelFactory.create_layout_detector(
+                    _merge_child_model_config(model_config)
+                )
                 self.models[model_name] = detector
                 logger.info(f"✅ Model {model_name} initialized")
             except Exception as e:
@@ -68,11 +77,9 @@ class SmartLayoutRouter(BaseLayoutDetector):
         # 初始化回退模型(如果配置了)
         if self.fallback_config:
             try:
-                # 将 post_process 配置添加到回退模型配置中
-                fallback_config = self.fallback_config.copy()
-                if post_process_config:
-                    fallback_config['post_process'] = post_process_config
-                fallback_detector = ModelFactory.create_layout_detector(fallback_config)
+                fallback_detector = ModelFactory.create_layout_detector(
+                    _merge_child_model_config(self.fallback_config)
+                )
                 self.models['fallback'] = fallback_detector
                 logger.info("✅ Fallback model initialized")
             except Exception as e:
@@ -97,6 +104,19 @@ class SmartLayoutRouter(BaseLayoutDetector):
     def set_scene_name(self, scene_name: Optional[str]):
         """设置场景名称(用于scene策略)"""
         self.scene_name = scene_name
+
+    def _propagate_layout_debug_context(self, model: BaseLayoutDetector) -> None:
+        """将路由器上的 debug 上下文传给子 layout 模型(scene/auto 策略需要)。"""
+        if not self._is_layout_debug_enabled():
+            return
+        model.debug_mode = True  # type: ignore[attr-defined]
+        if self.output_dir:
+            model.output_dir = self.output_dir  # type: ignore[attr-defined]
+        if self.page_name:
+            model.page_name = self.page_name  # type: ignore[attr-defined]
+        parent_opts = self._layout_debug_options()
+        if parent_opts:
+            model.config['debug_options'] = parent_opts.copy()
     
     def _detect_raw(
         self, 
@@ -177,7 +197,9 @@ class SmartLayoutRouter(BaseLayoutDetector):
             selected_model = next(iter(self.models.keys()))
 
         logger.info(f"🎯 Scene strategy selected model: {selected_model} (scene: {self.scene_name})")
-        return self.models[selected_model].detect(image)
+        model = self.models[selected_model]
+        self._propagate_layout_debug_context(model)
+        return model.detect(image)
     
     def _ocr_eval_detect(
         self, 
@@ -201,14 +223,8 @@ class SmartLayoutRouter(BaseLayoutDetector):
             if model_name == 'fallback':
                 continue  # 跳过回退模型(除非所有模型都失败)
             try:
-                # 传递 debug 模式配置给子模型(如果启用)
-                if self.debug_mode:
-                    model.debug_mode = self.debug_mode  # type: ignore
-                    if self.output_dir:
-                        model.output_dir = self.output_dir  # type: ignore
-                    if self.page_name:
-                        model.page_name = self.page_name  # type: ignore
-                
+                self._propagate_layout_debug_context(model)
+
                 # 调用 detect() 方法,基类会自动执行后处理
                 results = model.detect(image)
                 all_postprocessed_results[model_name] = results
@@ -221,13 +237,7 @@ class SmartLayoutRouter(BaseLayoutDetector):
             # 如果所有模型都失败,尝试回退模型
             if 'fallback' in self.models:
                 logger.info("🔄 All models failed, using fallback model")
-                # 传递 debug 模式配置给回退模型(如果启用)
-                if self.debug_mode:
-                    self.models['fallback'].debug_mode = self.debug_mode  # type: ignore
-                    if self.output_dir:
-                        self.models['fallback'].output_dir = self.output_dir  # type: ignore
-                    if self.page_name:
-                        self.models['fallback'].page_name = self.page_name  # type: ignore
+                self._propagate_layout_debug_context(self.models['fallback'])
                 # 回退模型使用 detect() 方法(会自动执行后处理)
                 fallback_result = self.models['fallback'].detect(image)
                 return fallback_result
@@ -337,10 +347,12 @@ class SmartLayoutRouter(BaseLayoutDetector):
         # 使用选中的模型进行检测(使用 detect() 方法,会自动执行后处理)
         if selected_model in self.models:
             model = self.models[selected_model]
+            self._propagate_layout_debug_context(model)
             results = model.detect(image)
         else:
             # 回退到第一个可用模型
             first_model = next(iter(self.models.values()))
+            self._propagate_layout_debug_context(first_model)
             results = first_model.detect(image)
         
         return results
@@ -519,9 +531,11 @@ class SmartLayoutRouter(BaseLayoutDetector):
                                   font, font_scale, (255, 255, 255), text_thickness)
             
             # 保存对比图像
-            debug_dir = Path(self.output_dir) / "debug_comparison" / "layout_comparison"
+            from ocr_utils.module_debug_viz import resolve_module_debug_dir
+
+            debug_dir = resolve_module_debug_dir(self.output_dir, "layout_comparison")
             debug_dir.mkdir(parents=True, exist_ok=True)
-            output_path = debug_dir / f"{self.page_name}_layout_comparison.jpg"
+            output_path = debug_dir / f"{self.page_name}_layout_comparison.png"
             cv2.imwrite(str(output_path), vis_image)
             logger.info(f"📊 Saved layout comparison image: {output_path}")
             

+ 131 - 51
ocr_tools/universal_doc_parser/core/pipeline_manager_v2.py

@@ -392,50 +392,42 @@ class EnhancedDocPipeline:
             'pdf_type': pdf_type
         }
         
-        # 用于检测的图片(可能被旋转)
-        detection_image = original_image.copy()
         rotate_angle = 0
+        pdf_rotate_angle: Optional[int] = None
+        use_orientation_classifier = pdf_type == 'ocr'
 
-        # 0. 页级水印去除(全页一次;表格裁剪等下游仅做方向校正,避免重复去水印)
-        detection_image = self.preprocessor.remove_watermark(detection_image)
-        
-        # 1. 页面方向识别
-        # rotate_angle统一定义:图像需要逆时针旋转的角度(0/90/180/270)来变为正视
-        if pdf_type == 'ocr':
-            # 扫描件:使用OCR方向识别
-            try:
-                detection_image, rotate_angle = self.preprocessor.process(
-                    detection_image, skip_watermark=True
-                )
-                page_result['angle'] = rotate_angle
-                
-                if rotate_angle != 0:
-                    logger.info(f"📐 Page {page_idx}: rotated {rotate_angle}° for detection")
-            except Exception as e:
-                logger.warning(f"⚠️ Orientation detection failed: {e}")
-        elif pdf_type == 'txt' and pdf_doc is not None:
-            # 文字PDF:获取PDF页面rotation并转换为统一的rotate_angle定义
+        if pdf_type == 'txt' and pdf_doc is not None:
             try:
                 pdf_rotation_angle = PDFUtils.get_page_rotation(pdf_doc, page_idx)
                 if pdf_rotation_angle != 0:
-                    # 转换为OCR定义:图像需要逆时针旋转的角度
-                    # PDF rotation 270° 表示内容逆时针270° = 顺时针90°
-                    # 要恢复正视,需要逆时针90° (即360-270=90)
-                    rotate_angle = (360 - pdf_rotation_angle) % 360
-                    if rotate_angle == 360:
-                        rotate_angle = 0
-                    
-                    # 将图片旋转为正视(使用rotate_angle,逆时针旋转)
-                    from PIL import Image
-                    pil_rotated = Image.fromarray(detection_image).rotate(rotate_angle, expand=True)
-                    detection_image = np.array(pil_rotated)
-                    page_result['angle'] = rotate_angle
-                    logger.info(f"📐 Page {page_idx}: PDF rotation {pdf_rotation_angle}°, rotated image {rotate_angle}° to upright")
+                    pdf_rotate_angle = (360 - pdf_rotation_angle) % 360
+                    if pdf_rotate_angle == 360:
+                        pdf_rotate_angle = 0
+                    if pdf_rotate_angle:
+                        logger.info(
+                            f"📐 Page {page_idx}: PDF rotation {pdf_rotation_angle}°, "
+                            f"will rotate image {pdf_rotate_angle}° to upright"
+                        )
             except Exception as e:
                 logger.warning(f"⚠️ Failed to get PDF rotation: {e}")
 
-        
-        # 2. Layout检测
+        # 0. 页级预处理(方向校正 → 去水印,见 preprocessor.order)
+        self._inject_watermark_debug_context(output_dir, page_name)
+        try:
+            detection_image, rotate_angle = self.preprocessor.prepare_detection_image(
+                original_image.copy(),
+                pdf_rotate_angle=pdf_rotate_angle,
+                use_orientation_classifier=use_orientation_classifier,
+            )
+            page_result['angle'] = rotate_angle
+            page_result['inference_image'] = detection_image
+            if rotate_angle != 0:
+                logger.info(f"📐 Page {page_idx}: detection image upright (rotate {rotate_angle}°)")
+        except Exception as e:
+            logger.warning(f"⚠️ Page preprocessing failed, using original copy: {e}")
+            detection_image = original_image.copy()
+
+        # 1. Layout检测
         try:
             # 如果使用智能路由器且策略是ocr_eval,需要先获取OCR spans(只检测文本框,不识别文字)
             ocr_spans_for_layout = None
@@ -456,12 +448,18 @@ class EnhancedDocPipeline:
                     except Exception as e:
                         logger.warning(f"⚠️ Pre-OCR text box detection for layout evaluation failed: {e}")
             
-            # 注入每页运行时信息(output_dir/page_name 仅在 layout detector 自身 debug 开启时才有意义)
-            if hasattr(self.layout_detector, 'debug_mode') and self.layout_detector.debug_mode:  # type: ignore
-                if output_dir and hasattr(self.layout_detector, 'output_dir'):
+            # 注入每页运行时信息(SmartLayoutRouter scene 策略需传到子模型)
+            layout_dbg = (
+                getattr(self.layout_detector, '_is_layout_debug_enabled', None)
+                and self.layout_detector._is_layout_debug_enabled()  # type: ignore
+            )
+            if layout_dbg and hasattr(self.layout_detector, 'output_dir'):
+                if output_dir:
                     self.layout_detector.output_dir = output_dir  # type: ignore
                 if page_name and hasattr(self.layout_detector, 'page_name'):
                     self.layout_detector.page_name = page_name  # type: ignore
+                if hasattr(self.layout_detector, 'debug_mode'):
+                    self.layout_detector.debug_mode = True  # type: ignore
             
             # 调用layout检测(传递OCR spans如果可用)
             if ocr_spans_for_layout is not None and hasattr(self.layout_detector, 'detect'):
@@ -543,6 +541,9 @@ class EnhancedDocPipeline:
                 all_ocr_spans = SpanMatcher.remove_duplicate_spans(all_ocr_spans)
                 all_ocr_spans = self._sort_spans_by_position(all_ocr_spans)
                 logger.info(f"📝 Page {page_idx}: OCR detected {len(all_ocr_spans)} text spans")
+                self._save_page_ocr_debug_if_enabled(
+                    detection_image, all_ocr_spans, output_dir, page_name
+                )
             except Exception as e:
                 logger.warning(f"⚠️ Full-page OCR failed: {e}")                
             # 3.1 调试模式:对比 OCR 和 PDF 提取结果
@@ -608,6 +609,77 @@ class EnhancedDocPipeline:
         page_result['discarded_blocks'] = sorted_discarded
         return page_result
 
+    def _build_table_module_debug_override(
+        self,
+        module_key: str,
+        *,
+        output_dir: Optional[str],
+        prefix: Optional[str] = None,
+        enabled: bool = False,
+    ) -> Dict[str, Any]:
+        """合并 yaml 中 table_* 的 debug_options,输出至 debug/{subdir}/。"""
+        cfg_opts = self.config.get(module_key, {}).get('debug_options', {})
+        if not isinstance(cfg_opts, dict):
+            cfg_opts = {}
+        override: Dict[str, Any] = dict(cfg_opts)
+        override['enabled'] = bool(enabled or cfg_opts.get('enabled', False))
+        if output_dir:
+            override['output_dir'] = output_dir
+        if prefix is not None:
+            override['prefix'] = prefix
+        return override
+
+    def _is_page_ocr_debug_enabled(self) -> bool:
+        opts = self.config.get('ocr_recognition', {}).get('debug_options', {})
+        return isinstance(opts, dict) and bool(opts.get('enabled', False))
+
+    def _save_page_ocr_debug_if_enabled(
+        self,
+        image: np.ndarray,
+        spans: List[Dict[str, Any]],
+        output_dir: Optional[str],
+        page_name: Optional[str],
+    ) -> None:
+        """整页 OCR 完成后保存 module debug(底图=inference_image,与 layout 一致)。"""
+        if not self._is_page_ocr_debug_enabled() or not output_dir or not page_name:
+            return
+        from ocr_utils.module_debug_viz import save_ocr_debug
+
+        opts = self.config.get('ocr_recognition', {}).get('debug_options', {})
+        if not isinstance(opts, dict):
+            opts = {}
+        save_ocr_debug(
+            image,
+            spans,
+            output_dir,
+            page_name,
+            subdir=opts.get('subdir', 'ocr_recognition'),
+            image_format=opts.get('image_format', 'png'),
+            save_json=bool(opts.get('save_json', True)),
+        )
+
+    def _inject_watermark_debug_context(
+        self,
+        output_dir: Optional[str],
+        page_name: Optional[str],
+    ) -> None:
+        """按页注入水印 debug 输出路径(与 layout_detection 一致)。"""
+        pre = self.preprocessor
+        if pre is None or not hasattr(pre, '_is_watermark_debug_enabled'):
+            return
+        wm_opts = (
+            self.config.get('preprocessor', {})
+            .get('watermark_removal', {})
+            .get('debug_options', {})
+        )
+        if not isinstance(wm_opts, dict) or not wm_opts.get('enabled', False):
+            return
+        if output_dir:
+            pre.output_dir = output_dir  # type: ignore[attr-defined]
+        if page_name:
+            pre.page_name = page_name  # type: ignore[attr-defined]
+        pre.debug_mode = True  # type: ignore[attr-defined]
+
     @staticmethod
     def _convert_pdf_blocks_to_spans(
         pdf_text_blocks: List[Dict[str, Any]],
@@ -700,9 +772,10 @@ class EnhancedDocPipeline:
                         cv2.rectangle(vis_image, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), (0, 0, 255), 2)
             
             # 保存对比图像
-            debug_dir = Path(output_dir) / "debug_comparison"
-            debug_dir.mkdir(parents=True, exist_ok=True)
-            output_path = debug_dir / f"{page_name}_ocr_comparison.jpg"
+            from ocr_utils.module_debug_viz import resolve_module_debug_dir
+
+            debug_dir = resolve_module_debug_dir(output_dir, "ocr_comparison")
+            output_path = debug_dir / f"{page_name}_ocr_comparison.png"
             cv2.imwrite(str(output_path), vis_image)
             
             # 保存对比 JSON
@@ -940,16 +1013,16 @@ class EnhancedDocPipeline:
                     bbox = item.get('bbox', [])
                     table_img = CoordinateUtils.crop_region(detection_image, bbox)
                     
-                    # 构造调试选项
-                    cls_debug_opts = {'enabled': debug_mode}
-                    if output_dir:
-                        cls_debug_opts['output_dir'] = output_dir
-                    if basename:
-                        cls_debug_opts['prefix'] = f"{basename}_{idx}"
-                    
+                    cls_debug_opts = self._build_table_module_debug_override(
+                        'table_classification',
+                        output_dir=output_dir,
+                        prefix=f"{basename}_{idx}" if basename else None,
+                        enabled=debug_mode,
+                    )
+
                     cls_result = self.table_classifier.classify(
-                        table_img, 
-                        debug_options=cls_debug_opts
+                        table_img,
+                        debug_options=cls_debug_opts,
                     )
                     table_type = cls_result.get('table_type', 'wireless')
                     confidence = cls_result.get('confidence', 0.0)
@@ -967,11 +1040,18 @@ class EnhancedDocPipeline:
                 if should_use_wired:
                     # 有线表格路径:UNet 识别
                     logger.info(f"🔷 Table {idx}: Using wired UNet recognition")
+                    wired_debug_opts = self._build_table_module_debug_override(
+                        'table_recognition_wired',
+                        output_dir=output_dir,
+                        prefix=f"{basename}_{idx}" if basename else None,
+                        enabled=debug_mode,
+                    )
                     element = self.element_processors.process_table_element_wired(
                         detection_image, item, scale, pre_matched_spans=spans, pdf_type=pdf_type,
                         output_dir=output_dir, basename=f"{basename}_{idx}",
                         normalize_numbers=normalize_numbers,
                         debug_mode=debug_mode,
+                        debug_options=wired_debug_opts,
                     )
                     # 如果有线识别失败(返回空 HTML),fallback 到 VLM
                     if not element['content'].get('html') and not element['content'].get('cells'):

+ 18 - 8
ocr_tools/universal_doc_parser/main_v2.py

@@ -187,6 +187,15 @@ def _apply_debug_overrides_to_config(
                 config['layout_detection']['debug_options'] = {}
             config['layout_detection']['debug_options']['enabled'] = True
             logger.info("✅ 启用布局检测 debug 输出")
+
+    # 1b. 水印去除 debug(页级预处理,与 layout 同属页面级可视化)
+    if enable_layout_debug:
+        preprocessor_cfg = config.setdefault('preprocessor', {})
+        wm_cfg = preprocessor_cfg.setdefault('watermark_removal', {})
+        if 'debug_options' not in wm_cfg:
+            wm_cfg['debug_options'] = {}
+        wm_cfg['debug_options']['enabled'] = True
+        logger.info("✅ 启用水印去除 debug 输出")
     
     # 2. 表格分类 debug
     if enable_table_debug:
@@ -212,13 +221,14 @@ def _apply_debug_overrides_to_config(
             config['ocr_recognition']['debug_options']['enabled'] = True
             logger.info("✅ 启用 OCR 识别 debug 输出")
     
-    # 5. 更新输出配置
+    # 5. 更新输出配置(模块 debug 与审计图解耦;仅 --debug 全局模式默认打开审计图)
     if enable_layout_debug or enable_ocr_debug or enable_table_debug:
-        output_config = config.get('output', {})
+        output_config = config.setdefault('output', {})
         output_config['debug_mode'] = True
-        if enable_layout_debug or enable_ocr_debug:
-            output_config.setdefault('save_layout_image', True)
-            output_config.setdefault('save_ocr_image', True)
+    if debug:
+        output_config = config.setdefault('output', {})
+        output_config.setdefault('save_layout_image', True)
+        output_config.setdefault('save_ocr_image', True)
     
     # 输出当前 debug 状态
     if debug:
@@ -644,8 +654,8 @@ if __name__ == "__main__":
             # "config": "./config/bank_statement_paddle_vl_local.yaml",
             # "log_file": "./output/logs/bank_statement_paddle_vl_local/process.log",
 
-            "input": "/Users/zhch158/workspace/data/流水分析/陈3_微信图.pdf",
-            "output_dir": "/Users/zhch158/workspace/data/流水分析/陈3_微信图/bank_statement_yusys_local",
+            "input": "/Users/zhch158/workspace/data/流水分析/彭_广东兴宁农村商业银行.pdf",
+            "output_dir": "/Users/zhch158/workspace/data/流水分析/彭_广东兴宁农村商业银行/bank_statement_yusys_local",
             "config": "./config/bank_statement_yusys_local.yaml",
             "log_file": "./output/logs/bank_statement_yusys_local/process.log",
 
@@ -662,7 +672,7 @@ if __name__ == "__main__":
             # "scene": "financial_report",
             
             # 页面范围(可选)
-            "pages": "3",  # 只处理前1页
+            "pages": "2",  # 只处理前1页
             # "pages": "1-3,5,7-10",  # 处理指定页面
             # "pages": "83-109",  # 处理指定页面
 

+ 200 - 158
ocr_tools/universal_doc_parser/models/adapters/base.py

@@ -25,6 +25,73 @@ class BaseAdapter(ABC):
 
 class BasePreprocessor(BaseAdapter):
     """预处理器基类"""
+
+    def __init__(self, config: Dict[str, Any]):
+        super().__init__(config)
+        # 运行时由 pipeline 按页注入(与 layout_detection 一致)
+        self.debug_mode: Optional[bool] = None
+        self.output_dir: Optional[str] = None
+        self.page_name: Optional[str] = None
+
+    def _watermark_debug_options(self) -> Dict[str, Any]:
+        wm_cfg = self.config.get('watermark_removal', {})
+        opts = wm_cfg.get('debug_options', {})
+        return opts if isinstance(opts, dict) else {}
+
+    def _is_watermark_debug_enabled(self) -> bool:
+        debug_mode = getattr(self, 'debug_mode', None)
+        if debug_mode is not None:
+            return bool(debug_mode)
+        return bool(self._watermark_debug_options().get('enabled', False))
+
+    def _resolve_watermark_debug_paths(self) -> Tuple[Optional[str], str]:
+        output_dir = getattr(self, 'output_dir', None)
+        if output_dir is None:
+            output_dir = self._watermark_debug_options().get('output_dir')
+        page_name = getattr(self, 'page_name', None)
+        if not page_name:
+            page_name = self._watermark_debug_options().get('prefix') or 'watermark'
+        prefix = self._watermark_debug_options().get('prefix', '')
+        if prefix and page_name and not str(page_name).startswith(str(prefix)):
+            page_name = f"{prefix}_{page_name}"
+        return output_dir, str(page_name)
+
+    def _save_watermark_debug_images(
+        self,
+        before: np.ndarray,
+        after: np.ndarray,
+        threshold: int,
+        morph_close_kernel: int,
+        contrast_cfg: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        """保存水印调试图(委托 ocr_utils.watermark_utils)。"""
+        from ocr_utils.watermark_utils import save_watermark_removal_debug
+
+        output_dir, page_name = self._resolve_watermark_debug_paths()
+        if not output_dir:
+            return
+
+        opts = self._watermark_debug_options()
+        params: Dict[str, Any] = {
+            "threshold": threshold,
+            "morph_close_kernel": morph_close_kernel,
+        }
+        if contrast_cfg:
+            params["contrast_enhancement"] = contrast_cfg
+
+        try:
+            save_watermark_removal_debug(
+                before,
+                after,
+                output_dir,
+                page_name,
+                processing_params=params,
+                image_format=opts.get("image_format") or "png",
+                save_compare=opts.get("save_compare", True),
+                subdir=opts.get("subdir", "watermark_removal"),
+            )
+        except Exception as e:
+            logger.warning(f"Watermark debug save failed: {e}")
     
     def remove_watermark(self, image: Union[np.ndarray, Image.Image]) -> np.ndarray:
         """页级水印去除(默认无操作,子类可覆盖)。"""
@@ -32,17 +99,82 @@ class BasePreprocessor(BaseAdapter):
             return np.array(image)
         return image
 
-    @abstractmethod
+    def _preprocess_order(self) -> str:
+        """预处理步骤顺序:orient_first(默认)| watermark_first。"""
+        order = str(self.config.get('order', 'orient_first')).strip().lower()
+        if order not in ('orient_first', 'watermark_first'):
+            logger.warning(
+                f"Unknown preprocessor.order={order!r}, fallback to orient_first"
+            )
+            return 'orient_first'
+        return order
+
+    def correct_orientation(
+        self,
+        image: Union[np.ndarray, Image.Image],
+        *,
+        pdf_rotate_angle: Optional[int] = None,
+        use_orientation_classifier: bool = True,
+    ) -> tuple[np.ndarray, int]:
+        """
+        仅方向校正,不去水印。用于表格裁剪等页级已预处理场景。
+
+        Args:
+            pdf_rotate_angle: 文字 PDF 页级旋转(逆时针角度,与 pipeline 一致)
+            use_orientation_classifier: 是否使用方向分类器(扫描件为 True)
+        """
+        if isinstance(image, Image.Image):
+            image = np.array(image)
+
+        if pdf_rotate_angle:
+            pil_rotated = Image.fromarray(image).rotate(pdf_rotate_angle, expand=True)
+            return np.array(pil_rotated), int(pdf_rotate_angle)
+        return image, 0
+
+    def prepare_detection_image(
+        self,
+        image: Union[np.ndarray, Image.Image],
+        *,
+        pdf_rotate_angle: Optional[int] = None,
+        use_orientation_classifier: bool = True,
+    ) -> tuple[np.ndarray, int]:
+        """
+        页级完整预处理:按 preprocessor.order 执行方向校正与水印去除。
+
+        Returns:
+            (detection_image, rotate_angle)
+        """
+        if isinstance(image, Image.Image):
+            image = np.array(image)
+
+        order = self._preprocess_order()
+
+        def _orient(img: np.ndarray) -> tuple[np.ndarray, int]:
+            return self.correct_orientation(
+                img,
+                pdf_rotate_angle=pdf_rotate_angle,
+                use_orientation_classifier=use_orientation_classifier,
+            )
+
+        if order == 'watermark_first':
+            cleaned = self.remove_watermark(image)
+            return _orient(cleaned)
+
+        oriented, rotate_angle = _orient(image)
+        return self.remove_watermark(oriented), rotate_angle
+
     def process(
         self,
         image: Union[np.ndarray, Image.Image],
         skip_watermark: bool = False,
     ) -> tuple[np.ndarray, int]:
         """
-        处理图像
-        返回处理后的图像和旋转角度
+        裁剪块:仅方向校正(skip_watermark=True)。
+        页级请使用 prepare_detection_image()。
         """
-        pass
+        if skip_watermark:
+            return self.correct_orientation(image, use_orientation_classifier=True)
+        return self.prepare_detection_image(image, use_orientation_classifier=True)
     
     def _apply_rotation(self, image: np.ndarray, rotation_angle: int) -> np.ndarray:
         """应用旋转"""
@@ -92,63 +224,30 @@ class BaseLayoutDetector(BaseAdapter):
         # 调用子类实现的原始检测方法
         layout_results = self._detect_raw(image, ocr_spans)
         
-        # Debug 模式:打印和可视化后处理前的检测结果
-        # 优先从实例属性读取(如果存在),否则从配置读取
-        # 支持两种配置方式:debug_mode 或 debug_options.enabled
-        debug_mode = getattr(self, 'debug_mode', None)
-        if debug_mode is None:
-            if hasattr(self, 'config'):
-                # 优先从 debug_mode 读取
-                debug_mode = self.config.get('debug_mode', False)
-                # 如果没有 debug_mode,尝试从 debug_options.enabled 读取
-                if not debug_mode:
-                    debug_options = self.config.get('debug_options', {})
-                    if isinstance(debug_options, dict):
-                        debug_mode = debug_options.get('enabled', False)
-            else:
-                debug_mode = False
-        
+        debug_mode = self._is_layout_debug_enabled()
+        output_dir, page_name = self._resolve_layout_debug_paths()
+        dbg_opts = self._layout_debug_options()
+
         if debug_mode:
-            logger.debug(f"🔍 Layout detection raw results (before post-processing): {len(layout_results)} elements")
-            # logger.debug(f"Raw layout_results: {layout_results}")
-            # 可视化 layout 结果
-            output_dir = getattr(self, 'output_dir', None)
-            if output_dir is None:
-                if hasattr(self, 'config'):
-                    # 优先从 output_dir 读取
-                    output_dir = self.config.get('output_dir', None)
-                    # 如果没有 output_dir,尝试从 debug_options.output_dir 读取
-                    if output_dir is None:
-                        debug_options = self.config.get('debug_options', {})
-                        if isinstance(debug_options, dict):
-                            output_dir = debug_options.get('output_dir', None)
-                else:
-                    output_dir = None
-            
-            page_name = getattr(self, 'page_name', None)
-            if page_name is None:
-                if hasattr(self, 'config'):
-                    # 优先从 page_name 读取
-                    page_name = self.config.get('page_name', None)
-                    # 如果没有 page_name,尝试从 debug_options.prefix 读取
-                    if page_name is None:
-                        debug_options = self.config.get('debug_options', {})
-                        if isinstance(debug_options, dict):
-                            prefix = debug_options.get('prefix', '')
-                            page_name = prefix if prefix else 'layout_detection'
-                    if page_name is None:
-                        page_name = 'layout_detection'
-                else:
-                    page_name = 'layout_detection'
-            
-            if output_dir:
-                self._visualize_layout_results(image, layout_results, output_dir, page_name, suffix='raw')
-        
+            logger.debug(
+                f"Layout detection raw results (before post-processing): "
+                f"{len(layout_results)} elements"
+            )
+            if output_dir and dbg_opts.get('save_raw', True):
+                self._visualize_layout_results(
+                    image, layout_results, output_dir, page_name, suffix='raw'
+                )
+
         # 自动执行后处理
         if layout_results:
             layout_config = self.config.get('post_process', {}) if hasattr(self, 'config') else {}
             layout_results = self.post_process(layout_results, image, layout_config)
-        
+
+        if debug_mode and output_dir and dbg_opts.get('save_post_processed', True):
+            self._visualize_layout_results(
+                image, layout_results, output_dir, page_name, suffix='post'
+            )
+
         return layout_results
     
     @abstractmethod
@@ -325,116 +424,59 @@ class BaseLayoutDetector(BaseAdapter):
         }
         return category_map.get(category_id, f'unknown_{category_id}')
     
+    def _layout_debug_options(self) -> Dict[str, Any]:
+        opts = self.config.get('debug_options', {})
+        return opts if isinstance(opts, dict) else {}
+
+    def _is_layout_debug_enabled(self) -> bool:
+        debug_mode = getattr(self, 'debug_mode', None)
+        if debug_mode is not None:
+            return bool(debug_mode)
+        if self.config.get('debug_mode', False):
+            return True
+        return bool(self._layout_debug_options().get('enabled', False))
+
+    def _resolve_layout_debug_paths(self) -> Tuple[Optional[str], str]:
+        output_dir = getattr(self, 'output_dir', None)
+        if output_dir is None:
+            output_dir = self.config.get('output_dir')
+        if output_dir is None:
+            output_dir = self._layout_debug_options().get('output_dir')
+        if output_dir is not None:
+            output_dir = str(output_dir)
+
+        page_name = getattr(self, 'page_name', None)
+        if page_name is None:
+            page_name = self.config.get('page_name')
+        if not page_name:
+            prefix = self._layout_debug_options().get('prefix', '')
+            page_name = prefix if prefix else 'layout_detection'
+        return output_dir, str(page_name)
+
     def _visualize_layout_results(
         self,
         image: Union[np.ndarray, Image.Image],
         layout_results: List[Dict[str, Any]],
         output_dir: str,
         page_name: str,
-        suffix: str = 'raw'
+        suffix: str = 'raw',
     ) -> None:
-        """
-        可视化 layout 检测结果
-        
-        Args:
-            image: 输入图像
-            layout_results: 布局检测结果
-            output_dir: 输出目录
-            page_name: 页面名称
-            suffix: 文件名后缀(如 'raw', 'postprocessed')
-        """
+        """保存 layout 模块 debug(底图为 inference / detection 输入)。"""
         if not layout_results:
             return
-        
-        try:
-            # 转换为 numpy 数组
-            if isinstance(image, Image.Image):
-                vis_image = np.array(image)
-                if len(vis_image.shape) == 3 and vis_image.shape[2] == 3:
-                    # PIL RGB -> OpenCV BGR
-                    vis_image = cv2.cvtColor(vis_image, cv2.COLOR_RGB2BGR)
-            else:
-                vis_image = image.copy()
-                if len(vis_image.shape) == 3 and vis_image.shape[2] == 3:
-                    # 如果是 RGB,转换为 BGR
-                    vis_image = cv2.cvtColor(vis_image, cv2.COLOR_RGB2BGR)
-            
-            # 定义类别颜色映射 (BGR格式)
-            category_colors = {
-                'table_body': (0, 0, 255),      # 红色
-                'table_caption': (0, 0, 200),   # 暗红色
-                'table_footnote': (0, 0, 150),  # 更暗的红色
-                'text': (255, 0, 0),            # 蓝色
-                'title': (0, 255, 255),         # 黄色
-                'header': (255, 0, 255),        # 紫色
-                'footer': (0, 165, 255),        # 橙色
-                'image_body': (0, 255, 0),      # 绿色
-                'image_caption': (0, 200, 0),   # 暗绿色
-                'image_footnote': (0, 150, 0),  # 更暗的绿色
-                'abandon': (128, 128, 128),     # 灰色
-            }
-            
-            # 绘制检测框
-            for result in layout_results:
-                bbox = result.get('bbox', [])
-                if not bbox or len(bbox) < 4:
-                    continue
-                
-                category = result.get('category', 'unknown')
-                color = category_colors.get(category, (128, 128, 128))  # 默认灰色
-                thickness = 2
-                
-                x1, y1, x2, y2 = int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])
-                cv2.rectangle(vis_image, (x1, y1), (x2, y2), color, thickness)
-                
-                # 添加类别标签
-                label = f"{category}"
-                confidence = result.get('confidence', result.get('score', 0))
-                if confidence:
-                    label += f":{confidence:.2f}"
-                
-                # 计算文本大小
-                font = cv2.FONT_HERSHEY_SIMPLEX
-                font_scale = 0.4
-                text_thickness = 1
-                (text_width, text_height), baseline = cv2.getTextSize(label, font, font_scale, text_thickness)
-                
-                # 在框的上方绘制文本背景
-                text_y = max(y1 - baseline - 1, text_height + baseline)
-                cv2.rectangle(vis_image, (x1, text_y - text_height - baseline - 2), 
-                            (x1 + text_width, text_y), color, -1)
-                # 绘制文本
-                cv2.putText(vis_image, label, (x1, text_y - baseline - 1), 
-                          font, font_scale, (255, 255, 255), text_thickness)
-            
-            # 保存图像
-            debug_dir = Path(output_dir) / "debug_comparison" / "layout_detection"
-            debug_dir.mkdir(parents=True, exist_ok=True)
-            output_path = debug_dir / f"{page_name}_layout_{suffix}.jpg"
-            cv2.imwrite(str(output_path), vis_image)
-            logger.info(f"📊 Saved layout detection image ({suffix}): {output_path}")
-            
-            # 保存 JSON 数据
-            json_data = {
-                'page_name': page_name,
-                'suffix': suffix,
-                'count': len(layout_results),
-                'results': [
-                    {
-                        'category': r.get('category'),
-                        'bbox': r.get('bbox'),
-                        'confidence': r.get('confidence', r.get('score', 0.0))
-                    }
-                    for r in layout_results
-                ]
-            }
-            json_path = debug_dir / f"{page_name}_layout_{suffix}.json"
-            with open(json_path, 'w', encoding='utf-8') as f:
-                json.dump(json_data, f, ensure_ascii=False, indent=2)
-            logger.info(f"📊 Saved layout detection JSON ({suffix}): {json_path}")
-            
-        except Exception as e:
-            logger.warning(f"⚠️ Failed to visualize layout results: {e}")
+        from ocr_utils.module_debug_viz import save_layout_debug
+
+        opts = self._layout_debug_options()
+        save_layout_debug(
+            image,
+            layout_results,
+            output_dir,
+            page_name,
+            suffix=suffix,
+            subdir=opts.get('subdir', 'layout_detection'),
+            image_format=opts.get('image_format', 'png'),
+            save_json=bool(opts.get('save_json', True)),
+        )
     
     def _remove_overlapping_boxes(
         self,
@@ -615,7 +657,7 @@ class BaseVLRecognizer(BaseAdapter):
 
 class BaseOCRRecognizer(BaseAdapter):
     """OCR识别器基类"""
-    
+
     @abstractmethod
     def recognize_text(self, image: Union[np.ndarray, Image.Image]) -> List[Dict[str, Any]]:
         """识别文本"""

+ 62 - 22
ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py

@@ -59,56 +59,96 @@ class MinerUPreprocessor(BasePreprocessor):
         pass
 
     def remove_watermark(self, image: Union[np.ndarray, Image.Image]) -> np.ndarray:
-        """页级水印去除(应在整页图像上调用一次,勿对裁剪块重复调用)。"""
+        """页级水印去除 + 可选对比度增强(整页调用一次,勿对裁剪块重复)。"""
         if isinstance(image, Image.Image):
             image = np.array(image)
 
         watermark_cfg = self.config.get('watermark_removal', {})
-        if not watermark_cfg.get('enabled', False):
+        wm_enabled = bool(watermark_cfg.get('enabled', False))
+        # 对比度增强只有在水印去除之后才能生效
+        contrast_cfg = watermark_cfg.get('contrast_enhancement', {})
+        contrast_enabled = bool(
+            contrast_cfg.get('enabled', False) if isinstance(contrast_cfg, dict) else False
+        )
+
+        if not wm_enabled:
             return image
 
-        threshold = watermark_cfg.get('threshold', 160)
+        threshold = watermark_cfg.get('threshold', 175)
         morph_close_kernel = watermark_cfg.get('morph_close_kernel', 0)
+        before_image = image.copy()
         try:
             cleaned = remove_watermark_from_image_rgb(
                 image,
                 threshold=threshold,
                 morph_close_kernel=morph_close_kernel,
                 return_pil=False,
+                contrast_enhancement=contrast_cfg if isinstance(contrast_cfg, dict) else None,
+                apply_watermark_removal=wm_enabled,
+                watermark_removal_cfg=watermark_cfg,
             )
-            logger.info(f"🧹 Watermark removed (threshold={threshold})")
-            return cleaned
+            if wm_enabled:
+                method = watermark_cfg.get("method", "threshold")
+                logger.info(
+                    f"🧹 Watermark removed (method={method}, threshold={threshold})"
+                )
+            if contrast_enabled:
+                method = contrast_cfg.get('method', 'clahe') if isinstance(contrast_cfg, dict) else 'clahe'
+                logger.info(f"📈 Contrast enhanced (method={method})")
+            if self._is_watermark_debug_enabled():
+                try:
+                    self._save_watermark_debug_images(
+                        before_image,
+                        np.array(cleaned),
+                        threshold,
+                        morph_close_kernel,
+                        contrast_cfg if isinstance(contrast_cfg, dict) else None,
+                    )
+                except Exception as dbg_e:
+                    logger.warning(f"⚠️ Watermark debug save failed: {dbg_e}")
+            return np.array(cleaned)
         except Exception as e:
-            logger.warning(f"⚠️ Watermark removal failed, using original: {e}")
+            logger.warning(f"⚠️ Watermark/contrast preprocessing failed, using original: {e}")
             return image
 
-    def process(
+    def correct_orientation(
         self,
         image: Union[np.ndarray, Image.Image],
-        skip_watermark: bool = False,
+        *,
+        pdf_rotate_angle: Optional[int] = None,
+        use_orientation_classifier: bool = True,
     ) -> tuple[np.ndarray, int]:
-        """图像预处理:可选水印去除 + 方向校正。
-
-        Args:
-            image: 输入图像
-            skip_watermark: 为 True 时跳过水印(页级已去水印或裁剪块场景)
-        """
+        """方向校正(PDF 元数据旋转或 MinerU 方向分类器)。"""
         if isinstance(image, Image.Image):
             image = np.array(image)
 
-        rotate_angle = 0
-        processed_image = image if skip_watermark else self.remove_watermark(image)
+        if pdf_rotate_angle:
+            return super().correct_orientation(
+                image,
+                pdf_rotate_angle=pdf_rotate_angle,
+                use_orientation_classifier=False,
+            )
 
-        # 方向校正
-        if self.orientation_classifier is not None:
+        rotate_angle = 0
+        if use_orientation_classifier and self.orientation_classifier is not None:
             try:
-                rotate_angle = int(self.orientation_classifier.predict(processed_image))
-                processed_image = self._apply_rotation(processed_image, rotate_angle)
+                rotate_angle = int(self.orientation_classifier.predict(image))
+                image = self._apply_rotation(image, rotate_angle)
                 logger.info(f"📐 Applied rotation: {rotate_angle}")
             except Exception as e:
                 logger.error(f"⚠️ Orientation classification failed: {e}")
 
-        return processed_image, rotate_angle
+        return image, rotate_angle
+
+    def process(
+        self,
+        image: Union[np.ndarray, Image.Image],
+        skip_watermark: bool = False,
+    ) -> tuple[np.ndarray, int]:
+        """裁剪块仅方向校正;页级请用 prepare_detection_image()。"""
+        if skip_watermark:
+            return self.correct_orientation(image, use_orientation_classifier=True)
+        return self.prepare_detection_image(image, use_orientation_classifier=True)
 
 class MinerULayoutDetector(BaseLayoutDetector):
     """MinerU版式检测适配器"""
@@ -653,7 +693,7 @@ class MinerUOCRRecognizer(BaseOCRRecognizer):
                             'text': item[1][0],  # 识别文本
                             'confidence': item[1][1]  # 置信度
                         })
-                        
+
             return formatted_results
             
         except Exception as e:

+ 20 - 11
ocr_tools/universal_doc_parser/models/adapters/mineru_wired_table.py

@@ -60,7 +60,11 @@ class MinerUWiredTableRecognizer:
 
         # 初始化各个功能模块
         self.debug_utils = WiredTableDebugUtils()
-        self.debug_options = self.debug_utils.merge_debug_options(self.config, self.config.get("debug_options"))
+        self.debug_options = self.debug_utils.merge_debug_options(
+            self.config,
+            self.config.get("debug_options"),
+            default_subdir="table_recognition_wired",
+        )
         self.ocr_formatter = OCRFormatter()
         self.skew_detector = SkewDetector(self.config)
         self.grid_recovery = GridRecovery()
@@ -239,10 +243,13 @@ class MinerUWiredTableRecognizer:
             h, w = table_image.shape[:2]
             
             # 调试选项合并(需要在 run_unet 之前初始化,因为内部函数会引用)
-            dbg = self.debug_utils.merge_debug_options(self.config, debug_options or {})
-            debug_dir = None
-            if dbg and dbg.enabled and dbg.output_dir:
-                debug_dir = dbg.output_dir
+            dbg = self.debug_utils.merge_debug_options(
+                self.config,
+                debug_options or {},
+                default_subdir="table_recognition_wired",
+            )
+            debug_root = self.debug_utils.resolve_debug_output_dir(dbg)
+            debug_dir = str(debug_root) if debug_root else None
             
             # 定义内部函数以方便复用 UNet 推理
             def run_unet(img_in):
@@ -461,14 +468,15 @@ class MinerUWiredTableRecognizer:
             # 策略调整:默认对所有单元格进行 Cropped OCR,以解决 Header 误合并和文本分配错误问题。
             # Full-page OCR 结果仅作为 Fallback(在 text_filling.py 中逻辑是: 如果 Cropped OCR 结果为空或低分,才保留原值)
             if hasattr(self, 'ocr_engine') and self.ocr_engine:
-                # 从 debug_options 中获取输出目录
-                output_dir = dbg.output_dir if dbg and dbg.enabled else None
+                cell_ocr_dir = None
+                if debug_root is not None:
+                    cell_ocr_dir = str(debug_root / "tablecell_ocr")
                 texts = self.text_filler.second_pass_ocr_fill(
-                    table_image, bboxes_merged, texts, scores, 
+                    table_image, bboxes_merged, texts, scores,
                     need_reocr_indices=need_reocr_indices,
                     pdf_type=pdf_type,
                     force_all=False,  # Force Per-Cell OCR
-                    output_dir=output_dir
+                    output_dir=cell_ocr_dir,
                 )
 
             for i, cell in enumerate(merged_cells):
@@ -537,8 +545,9 @@ class MinerUWiredTableRecognizer:
             try:
                 # 合并传入的 debug_options
                 merged_debug_opts = self.debug_utils.merge_debug_options(
-                    self.config, 
-                    override=debug_options or self.debug_options.__dict__
+                    self.config,
+                    override=debug_options or self.debug_options.__dict__,
+                    default_subdir="table_recognition_wired",
                 )
                 return self.recognize_v4(table_image, ocr_boxes, pdf_type=pdf_type, debug_options=merged_debug_opts.__dict__)
             except Exception:

+ 6 - 3
ocr_tools/universal_doc_parser/models/adapters/paddle_table_classifier.py

@@ -57,7 +57,9 @@ class PaddleTableClassifier(BaseAdapter):
         
         # 初始化调试工具
         self.debug_utils = WiredTableDebugUtils()
-        self.debug_options = self.debug_utils.merge_debug_options(self.config)
+        self.debug_options = self.debug_utils.merge_debug_options(
+            self.config, default_subdir="table_classification"
+        )
         self.visualizer = WiredTableVisualizer()
         
     def initialize(self):
@@ -106,8 +108,9 @@ class PaddleTableClassifier(BaseAdapter):
             
         # 合并调试选项
         merged_debug_opts = self.debug_utils.merge_debug_options(
-            self.config, 
-            override=debug_options
+            self.config,
+            override=debug_options,
+            default_subdir="table_classification",
         )
         
         try:

+ 38 - 39
ocr_tools/universal_doc_parser/models/adapters/wired_table/debug_utils.py

@@ -3,6 +3,7 @@
 
 提供调试选项管理和路径生成功能。
 """
+from pathlib import Path
 from typing import Dict, Any, Optional
 from dataclasses import dataclass
 
@@ -12,6 +13,7 @@ class WiredTableDebugOptions:
     """调试选项数据类"""
     enabled: bool = False
     output_dir: Optional[str] = None
+    subdir: str = "table_recognition_wired"
     save_table_lines: bool = False
     save_connected_components: bool = False
     save_grid_structure: bool = False
@@ -22,86 +24,83 @@ class WiredTableDebugOptions:
 
 class WiredTableDebugUtils:
     """调试工具类"""
-    
+
     @staticmethod
     def merge_debug_options(
         config: Dict[str, Any],
-        override: Optional[Dict[str, Any]] = None
+        override: Optional[Dict[str, Any]] = None,
+        *,
+        default_subdir: str = "table_recognition_wired",
     ) -> WiredTableDebugOptions:
         """
         合并调试选项
-        
+
         Args:
             config: 配置字典
             override: 覆盖选项字典
-            
+            default_subdir: 未配置 subdir 时的默认值
+
         Returns:
             合并后的调试选项
         """
         debug_config = config.get("debug_options", {})
         if not isinstance(debug_config, dict):
-            # 兼容旧配置:如果不是字典,尝试作为 boolean 或 fall back
             debug_config = {}
 
         opts = WiredTableDebugOptions(
             enabled=bool(debug_config.get("enabled", False)),
             output_dir=debug_config.get("output_dir"),
+            subdir=str(debug_config.get("subdir") or default_subdir),
             save_table_lines=bool(debug_config.get("save_table_lines", False)),
-            save_connected_components=bool(debug_config.get("save_connected_components", False)),
+            save_connected_components=bool(
+                debug_config.get("save_connected_components", False)
+            ),
             save_grid_structure=bool(debug_config.get("save_grid_structure", False)),
             save_text_overlay=bool(debug_config.get("save_text_overlay", False)),
             image_format=str(debug_config.get("image_format", "png")),
             prefix=str(debug_config.get("prefix", "")),
         )
-        
+
         if override and isinstance(override, dict):
-            # 覆盖层允许临时启用或指定目录
             for k, v in override.items():
-                if hasattr(opts, k):
+                if hasattr(opts, k) and v is not None:
                     setattr(opts, k, v)
-        
+
         return opts
-    
+
+    @staticmethod
+    def resolve_debug_output_dir(
+        opts: Optional[WiredTableDebugOptions],
+    ) -> Optional[Path]:
+        """``{output_dir}/debug/{subdir}/``,与 layout/ocr module debug 一致。"""
+        if not opts or not opts.enabled or not opts.output_dir:
+            return None
+        from ocr_utils.module_debug_viz import resolve_module_debug_dir
+
+        return resolve_module_debug_dir(
+            opts.output_dir, opts.subdir or "table_recognition_wired"
+        )
+
     @staticmethod
     def debug_is_on(
         flag: str,
-        opts: Optional[WiredTableDebugOptions] = None
+        opts: Optional[WiredTableDebugOptions] = None,
     ) -> bool:
-        """
-        检查调试标志是否启用
-        
-        Args:
-            flag: 调试标志名称
-            opts: 调试选项(可选)
-            
-        Returns:
-            是否启用
-        """
         if not opts or not opts.enabled:
             return False
         if not opts.output_dir:
             return False
         return bool(getattr(opts, flag, False))
-    
+
     @staticmethod
     def debug_path(
         name: str,
-        opts: Optional[WiredTableDebugOptions] = None
+        opts: Optional[WiredTableDebugOptions] = None,
     ) -> Optional[str]:
-        """
-        生成调试文件路径
-        
-        Args:
-            name: 文件名(不含扩展名)
-            opts: 调试选项(可选)
-            
-        Returns:
-            完整文件路径,如果未启用则返回 None
-        """
-        if not opts or not opts.output_dir:
+        debug_dir = WiredTableDebugUtils.resolve_debug_output_dir(opts)
+        if debug_dir is None:
             return None
-        
-        prefix = (opts.prefix + "_") if opts.prefix else ""
-        ext = opts.image_format or "png"
-        return f"{opts.output_dir}/{prefix}{name}.{ext}"
 
+        prefix = (opts.prefix + "_") if opts and opts.prefix else ""
+        ext = (opts.image_format if opts else None) or "png"
+        return str(debug_dir / f"{prefix}{name}.{ext}")

+ 2 - 3
ocr_tools/universal_doc_parser/models/adapters/wired_table/text_filling.py

@@ -471,7 +471,7 @@ class TextFiller:
             need_reocr_indices: 需要二次 OCR 的单元格索引列表(OCR 误合并检测结果)
             pdf_type: str,  # 'ocr' 或 'txt'
             force_all: 是否强制对所有单元格进行 OCR (Default: False)
-            output_dir: 输出目录,如果提供则保存单元格OCR图片到 {output_dir}/tablecell_ocr/ 目录
+            output_dir: 单元格 OCR 调试目录(通常为 debug/table_recognition_wired/tablecell_ocr/)
         """
         try:
             if not self.ocr_engine:
@@ -485,10 +485,9 @@ class TextFiller:
             if need_reocr_indices is None:
                 need_reocr_indices = []
 
-            # 如果提供了输出目录,创建 tablecell_ocr 子目录
             cell_ocr_dir = None
             if output_dir:
-                cell_ocr_dir = os.path.join(output_dir, "tablecell_ocr")
+                cell_ocr_dir = output_dir
                 os.makedirs(cell_ocr_dir, exist_ok=True)
 
             h_img, w_img = table_image.shape[:2]

+ 231 - 0
ocr_utils/module_debug_viz.py

@@ -0,0 +1,231 @@
+"""
+模块级 Debug 可视化(Layout / OCR)
+
+用于 ``{output_dir}/debug/{subdir}/`` 下基于 inference_image 的调试图;
+用户审计图由 VisualizationUtils + original_image 负责,不在此模块。
+"""
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Union
+
+import cv2
+import numpy as np
+from loguru import logger
+from PIL import Image
+
+# 各模块 debug_options 默认落盘根目录(相对 pipeline output_dir)
+MODULE_DEBUG_ROOT = "debug"
+
+
+def resolve_module_debug_dir(
+    output_dir: Union[str, Path],
+    subdir: str,
+    *,
+    debug_root: str = MODULE_DEBUG_ROOT,
+) -> Path:
+    """``{output_dir}/{debug_root}/{subdir}/``,目录不存在则创建。"""
+    debug_dir = Path(output_dir) / debug_root / subdir
+    debug_dir.mkdir(parents=True, exist_ok=True)
+    return debug_dir
+
+
+LAYOUT_CATEGORY_COLORS_BGR = {
+    'table_body': (0, 0, 255),
+    'table_caption': (0, 0, 200),
+    'table_footnote': (0, 0, 150),
+    'text': (255, 0, 0),
+    'title': (0, 255, 255),
+    'header': (255, 0, 255),
+    'footer': (0, 165, 255),
+    'image_body': (0, 255, 0),
+    'image_caption': (0, 200, 0),
+    'image_footnote': (0, 150, 0),
+    'abandon': (128, 128, 128),
+}
+
+# 亮蓝(BGR),在白底/浅灰流水上比黄色更易辨认;与 layout 红色框区分
+OCR_BOX_COLOR_BGR = (255, 0, 0)
+OCR_BOX_LINE_THICKNESS = 2
+
+
+def _to_bgr(image: Union[np.ndarray, Image.Image]) -> np.ndarray:
+    if isinstance(image, Image.Image):
+        arr = np.array(image)
+    else:
+        arr = image.copy()
+    if arr.ndim == 2:
+        return cv2.cvtColor(arr, cv2.COLOR_GRAY2BGR)
+    if arr.shape[2] == 3:
+        return cv2.cvtColor(arr, cv2.COLOR_RGB2BGR)
+    return arr
+
+
+def draw_layout_boxes_cv2(
+    image: Union[np.ndarray, Image.Image],
+    layout_results: List[Dict[str, Any]],
+) -> np.ndarray:
+    """在 BGR 图像上绘制 layout 检测框,返回新图像。"""
+    vis = _to_bgr(image)
+    for result in layout_results:
+        bbox = result.get('bbox', [])
+        if not bbox or len(bbox) < 4:
+            continue
+        category = result.get('category', 'unknown')
+        color = LAYOUT_CATEGORY_COLORS_BGR.get(category, (128, 128, 128))
+        x1, y1, x2, y2 = int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])
+        cv2.rectangle(vis, (x1, y1), (x2, y2), color, 2)
+        label = category
+        confidence = result.get('confidence', result.get('score', 0))
+        if confidence:
+            label += f":{float(confidence):.2f}"
+        font = cv2.FONT_HERSHEY_SIMPLEX
+        font_scale = 0.4
+        text_thickness = 1
+        (text_width, text_height), baseline = cv2.getTextSize(
+            label, font, font_scale, text_thickness
+        )
+        text_y = max(y1 - baseline - 1, text_height + baseline)
+        cv2.rectangle(
+            vis,
+            (x1, text_y - text_height - baseline - 2),
+            (x1 + text_width, text_y),
+            color,
+            -1,
+        )
+        cv2.putText(
+            vis, label, (x1, text_y - baseline - 1),
+            font, font_scale, (255, 255, 255), text_thickness,
+        )
+    return vis
+
+
+def draw_ocr_spans_cv2(
+    image: Union[np.ndarray, Image.Image],
+    spans: List[Dict[str, Any]],
+    *,
+    max_label_chars: int = 12,
+) -> np.ndarray:
+    """在 BGR 图像上绘制 OCR span(poly 或 bbox)。"""
+    vis = _to_bgr(image)
+    for span in spans:
+        poly = span.get('poly')
+        bbox = span.get('bbox', [])
+        pts = None
+        if poly and len(poly) >= 4:
+            pts = np.array(poly, dtype=np.int32).reshape(-1, 2)
+        elif bbox and len(bbox) >= 4:
+            x0, y0, x1, y1 = map(int, bbox[:4])
+            pts = np.array(
+                [[x0, y0], [x1, y0], [x1, y1], [x0, y1]], dtype=np.int32
+            )
+        if pts is not None:
+            cv2.polylines(
+                vis, [pts], True, OCR_BOX_COLOR_BGR, OCR_BOX_LINE_THICKNESS
+            )
+        text = str(span.get('text', ''))[:max_label_chars]
+        if text and pts is not None:
+            x, y = int(pts[0][0]), int(pts[0][1])
+            cv2.putText(
+                vis, text, (x, max(y - 2, 10)),
+                cv2.FONT_HERSHEY_SIMPLEX, 0.35, OCR_BOX_COLOR_BGR, 1, cv2.LINE_AA,
+            )
+    return vis
+
+
+def save_layout_debug(
+    image: Union[np.ndarray, Image.Image],
+    layout_results: List[Dict[str, Any]],
+    output_dir: Union[str, Path],
+    page_name: str,
+    *,
+    suffix: str = 'raw',
+    subdir: str = 'layout_detection',
+    image_format: str = 'jpg',
+    save_json: bool = True,
+) -> Optional[Dict[str, str]]:
+    """保存 layout 模块 debug 图与 JSON。"""
+    if not layout_results or not output_dir:
+        return None
+    try:
+        fmt = (image_format or 'jpg').lstrip('.')
+        debug_dir = resolve_module_debug_dir(output_dir, subdir)
+        vis = draw_layout_boxes_cv2(image, layout_results)
+        img_path = debug_dir / f'{page_name}_layout_{suffix}.{fmt}'
+        cv2.imwrite(str(img_path), vis)
+        paths: Dict[str, str] = {'image': str(img_path)}
+        logger.info(f"Saved layout detection image ({suffix}): {img_path}")
+        if save_json:
+            json_data = {
+                'page_name': page_name,
+                'suffix': suffix,
+                'count': len(layout_results),
+                'results': [
+                    {
+                        'category': r.get('category'),
+                        'bbox': r.get('bbox'),
+                        'confidence': r.get('confidence', r.get('score', 0.0)),
+                    }
+                    for r in layout_results
+                ],
+            }
+            json_path = debug_dir / f'{page_name}_layout_{suffix}.json'
+            json_path.write_text(
+                json.dumps(json_data, ensure_ascii=False, indent=2),
+                encoding='utf-8',
+            )
+            paths['json'] = str(json_path)
+            logger.info(f"Saved layout detection JSON ({suffix}): {json_path}")
+        return paths
+    except Exception as e:
+        logger.warning(f"Failed to save layout debug ({suffix}): {e}")
+        return None
+
+
+def save_ocr_debug(
+    image: Union[np.ndarray, Image.Image],
+    spans: List[Dict[str, Any]],
+    output_dir: Union[str, Path],
+    page_name: str,
+    *,
+    subdir: str = 'ocr_recognition',
+    image_format: str = 'png',
+    save_json: bool = True,
+) -> Optional[Dict[str, str]]:
+    """保存 OCR 模块 debug 图与 JSON。"""
+    if not output_dir:
+        return None
+    try:
+        fmt = (image_format or 'png').lstrip('.')
+        debug_dir = resolve_module_debug_dir(output_dir, subdir)
+        vis = draw_ocr_spans_cv2(image, spans or [])
+        img_path = debug_dir / f'{page_name}_ocr_spans.{fmt}'
+        cv2.imwrite(str(img_path), vis)
+        paths: Dict[str, str] = {'image': str(img_path)}
+        logger.info(f"Saved OCR debug image: {img_path}")
+        if save_json:
+            json_data = {
+                'page_name': page_name,
+                'count': len(spans or []),
+                'spans': [
+                    {
+                        'bbox': s.get('bbox'),
+                        'poly': s.get('poly'),
+                        'text': s.get('text'),
+                        'confidence': s.get('confidence'),
+                    }
+                    for s in (spans or [])
+                ],
+            }
+            json_path = debug_dir / f'{page_name}_ocr_spans.json'
+            json_path.write_text(
+                json.dumps(json_data, ensure_ascii=False, indent=2),
+                encoding='utf-8',
+            )
+            paths['json'] = str(json_path)
+            logger.info(f"Saved OCR debug JSON: {json_path}")
+        return paths
+    except Exception as e:
+        logger.warning(f"Failed to save OCR debug: {e}")
+        return None

File diff suppressed because it is too large
+ 1154 - 2
ocr_utils/watermark_utils.py


Some files were not shown because too many files changed in this diff