SHA1
--- a/docs/ocr_tools/universal_doc_parser/HF_safetensors->GGUF.md
+++ b/docs/ocr_tools/universal_doc_parser/HF_safetensors->GGUF.md
@@ -0,0 +1,177 @@
 
				+确认完毕，给你结论和可执行方案。
			
 
				+
			
 
				+## 关键事实
			
 
				+
			
 
				+1. **llama.cpp 已原生支持 `paddleocr_vl` 架构**（build b8110，2026-02 合入）。你现在能跑 1.5-GGUF 就证明你的 llama.cpp 已支持。
			
 
				+2. **1.6 与 1.5 架构「完全兼容」**（官方原话 "zero-cost plug-and-play migration"）。这意味着 GGUF 转换/推理路径对 1.6 完全适用——没有任何适配成本。
			
 
				+3. 1.6 的 HF model tree 已显示 "Quantizations 2 models"，社区量化正在跟进（`noctrex` 这类作者对 1.5 更新很快）。
			
 
				+
			
 
				+所以你**不用等官方**，有两条高效路径。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 方案 A（最省事）：直接找/等社区 GGUF
			
 
				+
			
 
				+因为架构完全兼容，社区转 1.6 是零成本的。优先去这两个地方蹲：
			
 
				+
			
 
				+- [`noctrex/PaddleOCR-VL-1.5-GGUF`](https://huggingface.co/noctrex/PaddleOCR-VL-1.5-GGUF)（看是否出 1.6 仓库）
			
 
				+- 1.6 官方页的 [Quantizations 链接](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.6)（已挂 2 个量化）
			
 
				+
			
 
				+找到后，**你现有的 daemon 脚本几乎不用改**，只改 3 个变量（路径 + 别名）即可。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 方案 B（最可控，推荐）：自己转换 1.6 → GGUF
			
 
				+
			
 
				+模型只有 0.9B，M4 Pro 48G 转换+量化几分钟搞定。步骤：
			
 
				+
			
 
				+```bash
			
 
				+conda activate mineru
			
 
				+
			
 
				+# 1. 拿最新 llama.cpp（确保 convert 脚本支持 paddleocr_vl）
			
 
				+git clone https://github.com/ggml-org/llama.cpp.git
			
 
				+cd llama.cpp
			
 
				+pip install -r requirements.txt    # 转换脚本依赖
			
 
				+cmake -B build -DGGML_METAL=ON && cmake --build build -j
			
 
				+
			
 
				+# 2. 下载 1.6 原始权重（safetensors）
			
 
				+conda activate mineru
			
 
				+hf download PaddlePaddle/PaddleOCR-VL-1.6 --local-dir ~/models/PaddleOCR-VL-1.6
			
 
				+hf download opendatalab/MinerU2.5-Pro-2604-1.2B --local-dir ~/models/MinerU2.5-Pro-2604-1.2B
			
 
				+
			
 
				+# 3. 转语言模型主体（M4 48G 内存充足，直接 f16 不量化质量最好）
			
 
				+python convert_hf_to_gguf.py ~/models/PaddleOCR-VL-1.6 \
			
 
				+    --outfile ~/models/PaddleOCR-VL-1.6-GGUF \
			
 
				+    --outtype f16
			
 
				+
			
 
				+# 4. 转视觉投影器 mmproj（OCR 质量主要看这个，建议 f16/f32）
			
 
				+python convert_hf_to_gguf.py ~/models/PaddleOCR-VL-1.6 \
			
 
				+    --mmproj \
			
 
				+    --outfile ~/models/PaddleOCR-VL-1.6-GGUF/PaddleOCR-VL-1.6-mmproj.gguf \
			
 
				+    --outtype f16
			
 
				+cp ~/models/PaddleOCR-VL-1.6/chat_template.jinja ~/models/PaddleOCR-VL-1.6-GGUF/chat_template.jinja
			
 
				+
			
 
				+# 5. 转语言模型主体（M4 48G 内存充足，直接 f16 不量化质量最好）
			
 
				+python convert_hf_to_gguf.py ~/models/MinerU2.5-Pro-2604-1.2B \
			
 
				+    --outfile ~/models/MinerU2.5-Pro-2604-1.2B-GGUF \
			
 
				+    --outtype f16
			
 
				+
			
 
				+# 6. 转视觉投影器 mmproj（OCR 质量主要看这个，建议 f16/f32）
			
 
				+python convert_hf_to_gguf.py ~/models/MinerU2.5-Pro-2604-1.2B \
			
 
				+    --mmproj \
			
 
				+    --outfile ~/models/MinerU2.5-Pro-2604-1.2B-GGUF/MinerU2.5-Pro-2604-1.2B-mmproj.gguf \
			
 
				+    --outtype f16
			
 
				+cp ~/models/MinerU2.5-Pro-2604-1.2B/chat_template.jinja ~/models/MinerU2.5-Pro-2604-1.2B-GGUF/chat_template.jinja
			
 
				+
			
 
				+```
			
 
				+
			
 
				+> 模型总共才 ~1.5GB，48G 内存下**不建议量化**主模型（Q4 会掉 OCR 精度），直接 f16 即可；mmproj 用 f16 或 f32 质量最佳。
			
 
				+
			
 
				+转完后，把你 `paddle_local_daemon.sh` 里这几行指向新文件就行：
			
 
				+
			
 
				+```bash
			
 
				+HF_CACHE="$HOME/models/paddleocr_vl"
			
 
				+MODEL_PATH="$HF_CACHE/PaddleOCR-VL-1.6.gguf"
			
 
				+MMPROJ_PATH="$HF_CACHE/PaddleOCR-VL-1.6-mmproj.gguf"
			
 
				+MODEL_NAME="PaddleOCR-VL-1.6"
			
 
				+```
			
 
				+
			
 
				+#### 验证模型
			
 
				+```bash
			
 
				+# llama-server -hf ~/models/PaddleOCR-VL-1.6-GGUF
			
 
				+# llama-server -hf ~/models/MinerU2.5-Pro-2604-1.2B-GGUF
			
 
				+ocr_tools/daemons/mineru_local_daemon.sh start
			
 
				+
			
 
				+ocr_tools/daemons/curl_local_mineru.sh
			
 
				+
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 一个重要的坑：chat template
			
 
				+
			
 
				+社区/自转的 GGUF 如果不带模板，调用时会因为缺少图像占位符 token（`<|IMAGE_START|><|IMAGE_PLACEHOLDER|><|IMAGE_END|>`）而**崩溃**。你当前脚本能跑 1.5 官方 GGUF 是因为模板已内嵌。如果换 1.6 后启动报错或输出异常，给 `llama-server` 加上：
			
 
				+
			
 
				+```bash
			
 
				+--jinja --chat-template-file /path/to/chat_template.jinja
			
 
				+```
			
 
				+
			
 
				+`chat_template.jinja` 从 GGUF 仓库下载，或从 1.6 的 `tokenizer_config.json` 里的 `chat_template` 字段提取。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 又一个更隐蔽的坑：OTSL 结构 token 被当 special 过滤（MinerU2.5 实测）
			
 
				+
			
 
				+**现象**：MinerU2.5-Pro-2604 转成 GGUF 后，走 `/v1/chat/completions` 做表格识别，
			
 
				+返回 `content` 里**文字全对，但完全没有 OTSL 结构 token**（`<fcel>`/`<nl>`/`<ecel>` 一个都没有，
			
 
				+连字面 `<` 都没有），表格塌成纯文本，后续 `convert_otsl_to_html` 无法还原表格。
			
 
				+
			
 
				+**根因**：MinerU2.5 的 OTSL token 在 `tokenizer_config.json` / `tokenizer.json` 中被标记为
			
 
				+`"special": true` 并列入 `additional_special_tokens`。转 GGUF 后它们成为 **CONTROL/special token**，
			
 
				+而 **llama-server 的 chat/completions 默认不会把 special token 输出到 `content`**——
			
 
				+模型其实生成了 OTSL，但在返回时被剥离。
			
 
				+
			
 
				+**铁证对比**（两者都用 OTSL，差别只在 special 标记）：
			
 
				+
			
 
				+| 模型 | OTSL token `special` | chat content 输出 OTSL |
			
 
				+|---|---|---|
			
 
				+| PaddleOCR-VL-1.6 | `false`（USER_DEFINED） | ✅ 正常输出 `<fcel>...` |
			
 
				+| MinerU2.5-Pro-2604 | `true`（CONTROL） | ❌ 被过滤，只剩纯文本 |
			
 
				+
			
 
				+> 注意：社区现成 GGUF（如 `mradermacher/MinerU2.5-Pro-2604-1.2B-GGUF`）基于同样的 tokenizer 转换，
			
 
				+> **同样有这个坑**，换现成包也不行。
			
 
				+
			
 
				+**解决（转主模型前，把 7 个 OTSL token 改为非 special，再重转；mmproj 无需重转）**：
			
 
				+
			
 
				+```bash
			
 
				+conda activate mineru
			
 
				+python - <<'PY'
			
 
				+import json, os
			
 
				+d = os.path.expanduser("~/models/MinerU2.5-Pro-2604-1.2B")
			
 
				+otsl = {"<ched>", "<ecel>", "<fcel>", "<lcel>", "<ucel>", "<xcel>", "<nl>"}
			
 
				+
			
 
				+# 1) tokenizer.json：convert_hf_to_gguf 据此决定 token 类型（关键）
			
 
				+#    注意 indent=2：保持与原始一致的多行格式，避免被压成一行导致 diff 巨大
			
 
				+p = os.path.join(d, "tokenizer.json")
			
 
				+t = json.load(open(p, encoding="utf-8"))
			
 
				+for tok in t.get("added_tokens", []):
			
 
				+    if tok.get("content") in otsl:
			
 
				+        tok["special"] = False
			
 
				+json.dump(t, open(p, "w", encoding="utf-8"), ensure_ascii=False, indent=2)
			
 
				+
			
 
				+# 2) tokenizer_config.json：保持一致 + 从 additional_special_tokens 移除
			
 
				+p = os.path.join(d, "tokenizer_config.json")
			
 
				+c = json.load(open(p, encoding="utf-8"))
			
 
				+for v in c.get("added_tokens_decoder", {}).values():
			
 
				+    if v.get("content") in otsl:
			
 
				+        v["special"] = False
			
 
				+c["additional_special_tokens"] = [x for x in c.get("additional_special_tokens", []) if x not in otsl]
			
 
				+json.dump(c, open(p, "w", encoding="utf-8"), ensure_ascii=False, indent=2)
			
 
				+print("已将 OTSL token 标记为非 special，可重新转换主模型 GGUF")
			
 
				+PY
			
 
				+
			
 
				+# 重转主模型（mmproj 不用动）
			
 
				+python convert_hf_to_gguf.py ~/models/MinerU2.5-Pro-2604-1.2B \
			
 
				+    --outfile ~/models/MinerU2.5-Pro-2604-1.2B-GGUF \
			
 
				+    --outtype f16
			
 
				+```
			
 
				+
			
 
				+**验证**：重启 daemon 后再跑 `ocr_tools/daemons/curl_local_mineru.sh`，
			
 
				+`content` 里应能看到 `<fcel>`/`<nl>` 等结构 token。
			
 
				+
			
 
				+> 经验法则：**凡是需要出现在模型输出文本里的结构/标记 token（OTSL、Markdown 边界等），
			
 
				+> 转 GGUF 前都应确保它们是非 special（USER_DEFINED）**，否则会被 llama-server 从 chat content 过滤。
			
 
				+> PaddleOCR-VL 之所以没踩坑，正是因为它的 OTSL token 本就是 `special: false`。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 不推荐的路径
			
 
				+
			
 
				+- **transformers + MPS**：Mac 上 MPS 跑视觉模型慢，且官方明确说 transformers 路径只支持 element-level（单元素识别），**不支持整页解析**，会破坏你现在的流水线。
			
 
				+- **vLLM**：官方加速方案依赖 CUDA/Docker GPU，Mac 上没有可用 GPU 后端。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+**建议**：先按方案 B 自转一份 f16，跑通后对比 1.5 看精度提升是否值得切换；同时关注社区 1.6 GGUF，出了直接换方案 A。需要的话我可以（切到 Agent 模式后）帮你把脚本改造成支持 `--jinja` 并参数化版本号，方便 1.5/1.6 一键切换。
			
 
				+
			
 
				+Citation: [InsiderLLM 指南](https://insiderllm.com/guides/paddleocr-vl-local-document-ocr/)、[1.6 官方页](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.6)、[1.5-GGUF 用法](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF)。
			
--- a/docs/ocr_tools/universal_doc_parser/PaddleOCR-VL表格文字丢失-OTSL补丁.md
+++ b/docs/ocr_tools/universal_doc_parser/PaddleOCR-VL表格文字丢失-OTSL补丁.md
@@ -0,0 +1,193 @@
 
				+# PaddleOCR-VL 表格文字丢失问题与 OTSL 运行时补丁
			
 
				+
			
 
				+> 关联文档：[`paddleocr_vl 1.6->GGUF.md`](./paddleocr_vl%201.6-%3EGGUF.md)、[`llama.cpp配置说明.md`](./llama.cpp配置说明.md)
			
 
				+
			
 
				+## 1. 背景与现象
			
 
				+
			
 
				+使用本机编译的 `llama.cpp`（`llama-server`）以 GGUF 形式部署 **PaddleOCR-VL-1.6** 模型，
			
 
				+经 `universal_doc_parser` 走 `bank_statement_yusys_paddleocr_local` 流程解析银行流水图片时，
			
 
				+输出 JSON（如 `陈3_微信图_page_001.json`）中的表格**只有表格结构（`<tr>/<td>` 骨架），所有单元格文字为空**。
			
 
				+
			
 
				+关键观察：
			
 
				+
			
 
				+- `llama-server` 日志显示模型**确实生成了大量包含文字的 token**，并非模型没输出。
			
 
				+- 模型原始输出（`_PredictResult.text`）是带文字的 OTSL，例如：
			
 
				+
			
 
				+  ```text
			
 
				+  交易明细对应时间段<fcel>2023-08-12 00:00:00至2024-08-11 23:59:59<lcel><lcel>...<nl><fcel>具体交易明细<lcel>...<nl><fcel>交易单号<fcel>交易时间<fcel>...
			
 
				+  ```
			
 
				+
			
 
				+- 即文字在「模型输出」阶段是完整的，是在**后处理转 HTML 时丢失**的。
			
 
				+
			
 
				+## 2. 根因分析
			
 
				+
			
 
				+### 2.1 OTSL 与转换入口
			
 
				+
			
 
				+PaddleOCR-VL 以 **OTSL（Open Table Structure Language）** 表达表格，结构 token 有：
			
 
				+`<nl>`（换行/换行）、`<fcel>`（首文本单元格）、`<ecel>`（空单元格）、`<lcel>/<ucel>/<xcel>`（跨列/跨行/跨格）。
			
 
				+
			
 
				+后处理由第三方库 `mineru_vl_utils` 负责，OTSL→HTML 的转换函数为
			
 
				+`mineru_vl_utils/post_process/otsl2html.py:convert_otsl_to_html`。
			
 
				+
			
 
				+### 2.2 文字丢失的直接原因
			
 
				+
			
 
				+`convert_otsl_to_html` 内部依次调用：
			
 
				+
			
 
				+1. `otsl_extract_tokens_and_text(otsl_content)` → 拆出 `tokens` 与 `mixed_texts`；
			
 
				+2. `otsl_parse_texts(mixed_texts, tokens)` → 用 `text_idx` 把文字回填到各 `TableCell`。
			
 
				+
			
 
				+`otsl_parse_texts` 的文本回填逻辑**假设 `mixed_texts` 以结构 token 开头**。
			
 
				+而 PaddleOCR-VL 的输出中，**整张表的第一个单元格缺少前导 `<fcel>` token**
			
 
				+（如上例直接以「交易明细对应时间段」纯文本打头）。
			
 
				+
			
 
				+这导致 `text_idx` 从一开始就**永久错位**，后续所有单元格都取不到对应文字，
			
 
				+最终 `table_cells` 里每个 cell 的 `text` 都是空字符串 —— 表格只剩骨架。
			
 
				+
			
 
				+### 2.3 完整调用链
			
 
				+
			
 
				+```
			
 
				+adapter.content_extract()                       # mineru_adapter.py:436
			
 
				+  → MinerUClient.content_extract()              # mineru_client.py:832
			
 
				+      blocks[0].content = output.text           # ← 原始 OTSL，文字尚在
			
 
				+      → helper.post_process()                   # :845
			
 
				+        → post_process() → simple_process()     # post_process/__init__.py:150
			
 
				+          → convert_otsl_to_html(content)       # __init__.py:95  ← 文字在此丢失
			
 
				+```
			
 
				+
			
 
				+> 重要结论：文字在 `convert_otsl_to_html` 内部就已丢失，**对最终 HTML 做后处理无法挽回**
			
 
				+> （空 `<td>` 里已经没有文字）。修复必须发生在该函数执行**之前**。
			
 
				+
			
 
				+## 3. 方案选型
			
 
				+
			
 
				+| 方案 | 说明 | 结论 |
			
 
				+|---|---|---|
			
 
				+| 改 `chat_template.jinja` | 该模板是**输入**提示词格式，管不到模型输出 | ❌ 无效 |
			
 
				+| 对最终 HTML 后处理补字 | 文字已在转换中丢失，无源可补 | ❌ 不可行 |
			
 
				+| 直接改 `site-packages` 源码 | 升级/重装即丢失，团队不同步 | ❌ 仅临时 |
			
 
				+| fork + `pip install -e` 自有分支 | 维护成本最高，且需改第三方项目 | ⚠️ 过重 |
			
 
				+| **运行时 monkey-patch（最终采用）** | 不改第三方源码、随本仓库版本化、可开关、升级不丢 | ✅ 采用 |
			
 
				+
			
 
				+### 为什么 monkey-patch 打在 `post_process.convert_otsl_to_html`
			
 
				+
			
 
				+`post_process/__init__.py` 顶部 `from .otsl2html import convert_otsl_to_html`，
			
 
				+其内部 `simple_process` / `_convert_pure_table_content_to_html` 在调用时
			
 
				+**按 `mineru_vl_utils.post_process` 模块全局名在运行时查找**该函数。
			
 
				+
			
 
				+因此只要替换 `mineru_vl_utils.post_process.convert_otsl_to_html` 这个名字，
			
 
				+即可拦截库内全部内部调用（`__init__.py:72` 与 `:95`），而无需改任何源码。
			
 
				+本仓库的 `mineru_adapter.py` 只调用 `content_extract` / `batch_content_extract`，
			
 
				+**没有**自行 import 该函数，所以无需额外覆盖其他命名空间。
			
 
				+
			
 
				+## 4. 最终实现
			
 
				+
			
 
				+### 4.1 补丁模块
			
 
				+
			
 
				+新增 `ocr_tools/universal_doc_parser/models/adapters/_mineru_vl_patches.py`，
			
 
				+核心逻辑：调用原始 `convert_otsl_to_html` 之前，若 OTSL 以纯文本（非 `<table`、非结构 token）打头，
			
 
				+则补一个前导 `<fcel>`：
			
 
				+
			
 
				+```python
			
 
				+def _make_otsl_normalizer(orig_convert):
			
 
				+    def _normalize_then_convert(otsl_content):
			
 
				+        if isinstance(otsl_content, str):
			
 
				+            stripped = otsl_content.lstrip()
			
 
				+            if (stripped
			
 
				+                    and not stripped.startswith("<table")
			
 
				+                    and not stripped.startswith(_OTSL_STRUCT_TOKENS)):
			
 
				+                otsl_content = "<fcel>" + stripped
			
 
				+        return orig_convert(otsl_content)
			
 
				+    _normalize_then_convert.__wrapped__ = orig_convert
			
 
				+    return _normalize_then_convert
			
 
				+```
			
 
				+
			
 
				+通过 `apply_once()` 应用，特性：
			
 
				+
			
 
				+- **幂等**：模块级 `_applied` 标志，仅首次真正打补丁。
			
 
				+- **失败大声**：上游接口改名/找不到 `convert_otsl_to_html` 时抛 `RuntimeError`，
			
 
				+  避免补丁静默失效后又开始丢字。
			
 
				+- **双重覆盖**：同时覆盖 `post_process.convert_otsl_to_html`（关键）与
			
 
				+  `otsl2html.convert_otsl_to_html`（兜底）。
			
 
				+
			
 
				+### 4.2 调用点（放在 `__init__`，覆盖 mineru 与 paddle 两条路径）
			
 
				+
			
 
				+> ⚠️ 实际生产走的是 **`PaddleVLRecognizer`**（`module: paddle`），它**继承** `MinerUVLRecognizer`
			
 
				+> 但**重写了 `initialize()`**（直接 `MinerUClient(...)`，未调用父类 `initialize`）。
			
 
				+> 因此补丁若只放在 `MinerUVLRecognizer.initialize()`，paddle 路径不会执行。
			
 
				+>
			
 
				+> 解决：把补丁调用放在 **`MinerUVLRecognizer.__init__`**。`PaddleVLRecognizer.__init__`
			
 
				+> 会经 `super().__init__(config)` 到达这里，于是 mineru / paddle / glmocr（均继承自该基类）
			
 
				+> 三条路径都会在创建识别器时应用补丁，且早于任何 `content_extract`。
			
 
				+
			
 
				+```python
			
 
				+class MinerUVLRecognizer(BaseVLRecognizer):
			
 
				+    def __init__(self, config):
			
 
				+        super().__init__(config)
			
 
				+        if not MINERU_AVAILABLE:
			
 
				+            raise ImportError("MinerU components not available")
			
 
				+        self.vlm_model = None
			
 
				+        self.max_image_size = config.get('max_image_size', 1568)
			
 
				+        self.resize_mode = config.get('resize_mode', 'max')
			
 
				+
			
 
				+        # 应用 mineru_vl_utils 运行时补丁（paddle 重写了 initialize，但其 __init__ 经 super 到达此处）
			
 
				+        try:
			
 
				+            from ._mineru_vl_patches import apply_once as _apply_mineru_vl_patches
			
 
				+            _apply_mineru_vl_patches()
			
 
				+        except Exception as e:
			
 
				+            # 补丁失败不阻断识别器创建，退回默认行为，但明确告警
			
 
				+            logger.warning(f"应用 mineru_vl_utils 补丁失败（退回默认行为，表格可能丢字）: {e}")
			
 
				+```
			
 
				+
			
 
				+> 关于失败语义：补丁模块 `apply_once()` 自身遵循「失败大声」（上游接口改名等会抛 `RuntimeError`）；
			
 
				+> 适配器调用点再用 `try/except` 兜底，把异常降级为 `logger.warning`，
			
 
				+> 避免一个修复补丁把整个识别器的创建搞挂。
			
 
				+
			
 
				+## 5. 验证
			
 
				+
			
 
				+在 `mineru` 环境下加载补丁并用真实 OTSL 片段验证：
			
 
				+
			
 
				+```bash
			
 
				+conda run -n mineru python -c "
			
 
				+import mineru_vl_utils.post_process as pp
			
 
				+import importlib.util
			
 
				+spec = importlib.util.spec_from_file_location('_mineru_vl_patches', '_mineru_vl_patches.py')
			
 
				+m = importlib.util.module_from_spec(spec); spec.loader.exec_module(m)
			
 
				+print('apply_once ->', m.apply_once())          # True
			
 
				+print('apply_once again ->', m.apply_once())    # False（幂等）
			
 
				+otsl = '交易明细对应时间段<fcel>X<lcel><nl><fcel>具体交易明细<lcel><nl>'
			
 
				+html = pp.convert_otsl_to_html(otsl)
			
 
				+print('首格文字保留:', '交易明细对应时间段' in html)   # True
			
 
				+print(html[:120])
			
 
				+"
			
 
				+```
			
 
				+
			
 
				+输出（节选）：
			
 
				+
			
 
				+```text
			
 
				+已应用 mineru_vl_utils 补丁：OTSL 整表首格 <fcel> 归一化
			
 
				+apply_once -> True
			
 
				+apply_once again -> False
			
 
				+首格文字保留: True
			
 
				+<table><tr><td>交易明细对应时间段</td><td colspan="2">X</td></tr>...
			
 
				+```
			
 
				+
			
 
				+首格文字 `交易明细对应时间段` 被正确保留，问题修复。
			
 
				+
			
 
				+## 6. 维护注意事项
			
 
				+
			
 
				+1. **不要再改 `site-packages` / `mineru-vl-utils` 源码**。临时改动已还原为原始状态，
			
 
				+   所有修复都集中在 `_mineru_vl_patches.py`，随本仓库版本化。
			
 
				+2. **升级 `mineru_vl_utils` 后**，请重跑第 5 节验证脚本；若上游已修复同名问题，
			
 
				+   可考虑移除本补丁；若 `convert_otsl_to_html` 被改名，`apply_once()` 会抛 `RuntimeError` 提示。
			
 
				+3. **新增第三方运行时修补**统一加到 `_mineru_vl_patches.py` 并由 `apply_once()` 串联，
			
 
				+   保持「补丁集中、可开关、可追溯」。
			
 
				+4. 若未来 Layout 检测器等其他路径也直接触发 OTSL 转换，由于补丁是进程级全局生效，
			
 
				+   只要在该路径初始化时同样调用过 `apply_once()` 即可（幂等，可安全重复调用）。
			
 
				+
			
 
				+## 7. 涉及文件
			
 
				+
			
 
				+| 文件 | 变更 |
			
 
				+|---|---|
			
 
				+| `models/adapters/_mineru_vl_patches.py` | 新增：运行时补丁模块 |
			
 
				+| `models/adapters/mineru_adapter.py` | `MinerUVLRecognizer.__init__` 接入 `apply_once()`（覆盖 paddle 继承路径） |
			
 
				+| `models/adapters/paddle_vl_adapter.py` | 无需改动：`PaddleVLRecognizer` 经 `super().__init__` 自动应用补丁 |
			
 
				+| `site-packages/.../otsl2html.py` | 还原为原始状态（移除临时改动） |
			
--- a/ocr_tools/daemons/mineru_local_daemon.sh
+++ b/ocr_tools/daemons/mineru_local_daemon.sh
@@ -23,13 +23,16 @@ PORT="8103"
 
				 HOST="0.0.0.0"
			
 
				 
			
 
				 # 本地 GGUF 模型路径（llama-server -hf 下载后的实际路径）
			
 
				-HF_CACHE="$HOME/models/hf_home/hub/models--mradermacher--MinerU2.5-Pro-2604-1.2B-GGUF/snapshots/70429e9c728b6a5e904f358a9936c17bd3f5f4b8"
			
 
				-MODEL_PATH="$HF_CACHE/MinerU2.5-Pro-2604-1.2B.Q8_0.gguf"
			
 
				-MMPROJ_PATH="$HF_CACHE/MinerU2.5-Pro-2604-1.2B.mmproj-Q8_0.gguf"
			
 
				+HF_CACHE="$HOME/models/MinerU2.5-Pro-2604-1.2B-GGUF"
			
 
				+MODEL_PATH="$HF_CACHE/MinerU2.5-Pro-2604-1.2B-F16.gguf"
			
 
				+MMPROJ_PATH="$HF_CACHE/MinerU2.5-Pro-2604-1.2B-F16-mmproj.gguf"
			
 
				 
			
 
				 # 模型别名（对外暴露的模型 ID，对应 yaml 中的 model 字段）
			
 
				 MODEL_NAME="MinerU2.5-Pro-2604-1.2B"
			
 
				 
			
 
				+# llama-server 执行文件
			
 
				+LLAMA_SERVER_EXECUTABLE="/Users/zhch158/workspace/repository.git/llama.cpp/build/bin/llama-server"
			
 
				+
			
 
				 # llama-server 参数
			
 
				 # 注意：MinerU2.5-Pro n_ctx_train=8192，设置 8192 即可
			
 
				 CONTEXT_SIZE="8192"          # 上下文长度（对齐模型 n_ctx_train=8192）
			
@@ -70,9 +73,7 @@ start() {
 
				     # 检查模型文件是否存在
			
 
				     if [ ! -f "$MODEL_PATH" ]; then
			
 
				         echo "❌ 主模型文件不存在: $MODEL_PATH"
			
 
				-        echo "请先运行以下命令下载模型:"
			
 
				-        echo "  llama-server -hf mradermacher/MinerU2.5-Pro-2604-1.2B-GGUF:Q8_0"
			
 
				-        echo "下载完成后更新脚本中的 HF_CACHE 路径（快照 hash 可能不同）"
			
 
				+        echo "请确认模型已下载到 llama.cpp 缓存目录"
			
 
				         return 1
			
 
				     fi
			
 
				 
			
@@ -82,15 +83,15 @@ start() {
 
				         return 1
			
 
				     fi
			
 
				 
			
 
				-    # 检查 llama-server 命令
			
 
				-    if ! command -v llama-server >/dev/null 2>&1; then
			
 
				-        echo "❌ llama-server 未找到"
			
 
				-        echo "请安装: brew install llama.cpp"
			
 
				+    # 检查 llama-server 执行文件（本机编译版本）
			
 
				+    if [ ! -x "$LLAMA_SERVER_EXECUTABLE" ]; then
			
 
				+        echo "❌ llama-server 执行文件不存在或不可执行: $LLAMA_SERVER_EXECUTABLE"
			
 
				+        echo "请确认已在本机编译 llama.cpp（cmake --build build）"
			
 
				         return 1
			
 
				     fi
			
 
				 
			
 
				-    echo "🔧 使用 llama-server: $(which llama-server)"
			
 
				-    echo "🔧 llama.cpp 版本: $(llama-server --version 2>&1 | head -1 || echo 'Unknown')"
			
 
				+    echo "🔧 使用 llama-server: $LLAMA_SERVER_EXECUTABLE"
			
 
				+    echo "🔧 llama.cpp 版本: $("$LLAMA_SERVER_EXECUTABLE" --version 2>&1 | head -1 || echo 'Unknown')"
			
 
				 
			
 
				     echo "💻 系统信息:"
			
 
				     echo "  架构: $(uname -m)"
			
@@ -100,7 +101,7 @@ start() {
 
				     # 启动 llama-server
			
 
				     # 注意：MinerU2.5-Pro GGUF 内嵌推荐采样参数（top_k=1, top_p=0.001, temp=0.01），
			
 
				     #       llama-server 会自动应用，此处只设 --temp 0 确保确定性解码
			
 
				-    nohup llama-server \
			
 
				+    nohup "$LLAMA_SERVER_EXECUTABLE" \
			
 
				         -m "$MODEL_PATH" \
			
 
				         --mmproj "$MMPROJ_PATH" \
			
 
				         --alias $MODEL_NAME \
			
@@ -242,10 +243,12 @@ config() {
 
				 
			
 
				     echo ""
			
 
				     echo "🔧 环境检查:"
			
 
				-    echo "  llama-server: $(which llama-server 2>/dev/null || echo '未安装')"
			
 
				-    if command -v llama-server >/dev/null 2>&1; then
			
 
				-        LLAMA_VERSION=$(llama-server --version 2>&1 | head -1 || echo 'Unknown')
			
 
				+    echo "  llama-server: $LLAMA_SERVER_EXECUTABLE"
			
 
				+    if [ -x "$LLAMA_SERVER_EXECUTABLE" ]; then
			
 
				+        LLAMA_VERSION=$("$LLAMA_SERVER_EXECUTABLE" --version 2>&1 | head -1 || echo 'Unknown')
			
 
				         echo "  版本: $LLAMA_VERSION"
			
 
				+    else
			
 
				+        echo "  ⚠️  执行文件不存在或不可执行"
			
 
				     fi
			
 
				     echo "  Conda: $(which conda 2>/dev/null || echo '未找到')"
			
 
				     echo "  当前 Python: $(which python 2>/dev/null || echo '未找到')"
			
@@ -275,7 +278,7 @@ test_api() {
 
				     response=$(curl -s --connect-timeout 10 http://127.0.0.1:$PORT/v1/models)
			
 
				     if [ $? -eq 0 ]; then
			
 
				         echo "✅ Models 端点可访问"
			
 
				-        echo "$response" | python3 -m json.tool 2>/dev/null || echo "$response"
			
 
				+        echo "$response" | python -m json.tool 2>/dev/null || echo "$response"
			
 
				     else
			
 
				         echo "❌ Models 端点不可访问"
			
 
				     fi
			
@@ -357,8 +360,8 @@ usage() {
 
				     echo "  ./mineru_local_daemon.sh test"
			
 
				     echo ""
			
 
				     echo "前置要求:"
			
 
				-    echo "  1. 安装 llama.cpp: brew install llama.cpp"
			
 
				-    echo "  2. 首次下载模型: llama-server -hf mradermacher/MinerU2.5-Pro-2604-1.2B-GGUF:Q8_0"
			
 
				+    echo "  1. 本机编译 llama.cpp，执行文件: $LLAMA_SERVER_EXECUTABLE"
			
 
				+    echo "  2. 模型文件位于: $HF_CACHE"
			
 
				     echo "  3. conda 环境 mineru 已配置"
			
 
				 }
			
 
				 
			
--- a/ocr_tools/daemons/paddle_local_daemon_1.6.sh
+++ b/ocr_tools/daemons/paddle_local_daemon_1.6.sh
@@ -0,0 +1,395 @@
 
				+#!/bin/bash
			
 
				+# filepath: ocr_platform/ocr_tools/daemons/paddleocr_local_daemon.sh
			
 
				+# 对应: PaddleOCR-VL 本地 llama-server 服务（macOS），使用 GGUF 格式模型
			
 
				+# 适用于 Mac M4 Pro 48G，使用 Metal GPU 加速
			
 
				+# 模型下载地址: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF
			
 
				+
			
 
				+# unset https_proxy http_proxy HF_ENDPOINT
			
 
				+# llama-server -hf PaddlePaddle/PaddleOCR-VL-1.5-GGUF
			
 
				+# mv ~/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5.gguf  ~/models/paddleocr_vl
			
 
				+# mv ~/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5-mmproj.gguf  ~/models/paddleocr_vl
			
 
				+
			
 
				+# curl -X POST http://localhost:8102/v1/chat/completions -d @payload.json
			
 
				+
			
 
				+LOGDIR="$HOME/workspace/logs"
			
 
				+mkdir -p $LOGDIR
			
 
				+PIDFILE="$LOGDIR/paddleocr_llamaserver.pid"
			
 
				+LOGFILE="$LOGDIR/paddleocr_llamaserver.log"
			
 
				+
			
 
				+# 配置参数
			
 
				+CONDA_ENV="mineru"
			
 
				+PORT="8102"
			
 
				+HOST="0.0.0.0"
			
 
				+
			
 
				+# 本地 GGUF 模型路径（llama-server -hf 下载后的实际路径）
			
 
				+HF_CACHE="$HOME/models/PaddleOCR-VL-1.6-GGUF"
			
 
				+MODEL_PATH="$HF_CACHE/PaddleOCR-VL-1.6-F16.gguf"
			
 
				+MMPROJ_PATH="$HF_CACHE/PaddleOCR-VL-1.6-F16-mmproj.gguf"
			
 
				+
			
 
				+# 模型别名（对外暴露的模型 ID，对应 yaml 中的 model_name）
			
 
				+MODEL_NAME="PaddleOCR-VL-1.6"
			
 
				+
			
 
				+# llama-server 执行文件
			
 
				+LLAMA_SERVER_EXECUTABLE="/Users/zhch158/workspace/repository.git/llama.cpp/build/bin/llama-server"
			
 
				+
			
 
				+# llama-server 参数
			
 
				+CONTEXT_SIZE="16384"         # 上下文长度（需 >= max_tokens，推荐 8192-16384）
			
 
				+GPU_LAYERS="99"              # Metal GPU 层数（99 表示全部）
			
 
				+THREADS="8"                  # CPU 线程数（M4 Pro 建议值）
			
 
				+BATCH_SIZE="512"             # 批处理大小
			
 
				+UBATCH_SIZE="128"            # 微批处理大小
			
 
				+
			
 
				+# conda 环境激活
			
 
				+if [ -f "$HOME/anaconda3/etc/profile.d/conda.sh" ]; then
			
 
				+    source "$HOME/anaconda3/etc/profile.d/conda.sh"
			
 
				+    conda activate $CONDA_ENV
			
 
				+elif [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then
			
 
				+    source "$HOME/miniconda3/etc/profile.d/conda.sh"
			
 
				+    conda activate $CONDA_ENV
			
 
				+elif [ -f "/opt/miniconda3/etc/profile.d/conda.sh" ]; then
			
 
				+    source /opt/miniconda3/etc/profile.d/conda.sh
			
 
				+    conda activate $CONDA_ENV
			
 
				+else
			
 
				+    echo "Warning: conda initialization file not found, trying direct path"
			
 
				+    export PATH="/opt/miniconda3/envs/$CONDA_ENV/bin:$PATH"
			
 
				+fi
			
 
				+
			
 
				+start() {
			
 
				+    if [ -f $PIDFILE ] && kill -0 $(cat $PIDFILE) 2>/dev/null; then
			
 
				+        echo "PaddleOCR-VL llama-server 已在运行"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    echo "启动 PaddleOCR-VL llama-server 守护进程..."
			
 
				+    echo "Host: $HOST, Port: $PORT"
			
 
				+    echo "主模型: $MODEL_PATH"
			
 
				+    echo "多模态投影器: $MMPROJ_PATH"
			
 
				+    echo "上下文长度: $CONTEXT_SIZE"
			
 
				+    echo "GPU 层数: $GPU_LAYERS (Metal)"
			
 
				+    echo "线程数: $THREADS"
			
 
				+
			
 
				+    # 检查模型文件是否存在
			
 
				+    if [ ! -f "$MODEL_PATH" ]; then
			
 
				+        echo "❌ 主模型文件不存在: $MODEL_PATH"
			
 
				+        echo "请确认模型已下载到 llama.cpp 缓存目录"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    if [ ! -f "$MMPROJ_PATH" ]; then
			
 
				+        echo "❌ 多模态投影器文件不存在: $MMPROJ_PATH"
			
 
				+        echo "请确认 mmproj 文件已下载"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    # 检查 llama-server 执行文件（本机编译版本）
			
 
				+    if [ ! -x "$LLAMA_SERVER_EXECUTABLE" ]; then
			
 
				+        echo "❌ llama-server 执行文件不存在或不可执行: $LLAMA_SERVER_EXECUTABLE"
			
 
				+        echo "请确认已在本机编译 llama.cpp（cmake --build build）"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    echo "🔧 使用 llama-server: $LLAMA_SERVER_EXECUTABLE"
			
 
				+    echo "🔧 llama.cpp 版本: $("$LLAMA_SERVER_EXECUTABLE" --version 2>&1 | head -1 || echo 'Unknown')"
			
 
				+
			
 
				+    echo "💻 系统信息:"
			
 
				+    echo "  架构: $(uname -m)"
			
 
				+    echo "  系统: $(uname -s)"
			
 
				+    echo "  内存: $(sysctl -n hw.memsize | awk '{printf "%.1f GB", $1/1024/1024/1024}')"
			
 
				+
			
 
				+    # 启动 llama-server
			
 
				+    nohup "$LLAMA_SERVER_EXECUTABLE" \
			
 
				+        -m "$MODEL_PATH" \
			
 
				+        --mmproj "$MMPROJ_PATH" \
			
 
				+        --alias $MODEL_NAME \
			
 
				+        --host $HOST \
			
 
				+        --port $PORT \
			
 
				+        --media-path $HOME/workspace \
			
 
				+        -c $CONTEXT_SIZE \
			
 
				+        -ngl $GPU_LAYERS \
			
 
				+        -t $THREADS \
			
 
				+        -b $BATCH_SIZE \
			
 
				+        -ub $UBATCH_SIZE \
			
 
				+        --temp 0 \
			
 
				+        > $LOGFILE 2>&1 &
			
 
				+
			
 
				+    echo $! > $PIDFILE
			
 
				+    echo "✅ PaddleOCR-VL llama-server 已启动，PID: $(cat $PIDFILE)"
			
 
				+    echo "📋 日志文件: $LOGFILE"
			
 
				+    echo "🌐 服务 URL: http://$HOST:$PORT"
			
 
				+    echo "📖 OpenAI 兼容 API: http://localhost:$PORT/v1 (chat/completions, models)"
			
 
				+    echo ""
			
 
				+    echo "等待服务启动..."
			
 
				+    sleep 5
			
 
				+    status
			
 
				+}
			
 
				+
			
 
				+stop() {
			
 
				+    if [ ! -f $PIDFILE ]; then
			
 
				+        echo "PaddleOCR-VL llama-server 未在运行"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    PID=$(cat $PIDFILE)
			
 
				+    echo "停止 PaddleOCR-VL llama-server (PID: $PID)..."
			
 
				+
			
 
				+    kill $PID
			
 
				+
			
 
				+    for i in {1..30}; do
			
 
				+        if ! kill -0 $PID 2>/dev/null; then
			
 
				+            break
			
 
				+        fi
			
 
				+        echo "等待进程停止... ($i/30)"
			
 
				+        sleep 1
			
 
				+    done
			
 
				+
			
 
				+    if kill -0 $PID 2>/dev/null; then
			
 
				+        echo "强制终止进程..."
			
 
				+        kill -9 $PID
			
 
				+    fi
			
 
				+
			
 
				+    rm -f $PIDFILE
			
 
				+    echo "✅ PaddleOCR-VL llama-server 已停止"
			
 
				+}
			
 
				+
			
 
				+status() {
			
 
				+    if [ -f $PIDFILE ] && kill -0 $(cat $PIDFILE) 2>/dev/null; then
			
 
				+        PID=$(cat $PIDFILE)
			
 
				+        echo "✅ PaddleOCR-VL llama-server 正在运行 (PID: $PID)"
			
 
				+        echo "🌐 服务 URL: http://$HOST:$PORT"
			
 
				+        echo "📋 日志文件: $LOGFILE"
			
 
				+
			
 
				+        # 检查端口监听状态
			
 
				+        if lsof -nP -iTCP:$PORT -sTCP:LISTEN >/dev/null 2>&1; then
			
 
				+            echo "🔗 端口 $PORT 正在监听"
			
 
				+        else
			
 
				+            echo "⚠️  端口 $PORT 未在监听（服务可能正在启动）"
			
 
				+        fi
			
 
				+
			
 
				+        # 检查 API 响应
			
 
				+        if command -v curl >/dev/null 2>&1; then
			
 
				+            if curl -s --connect-timeout 2 http://127.0.0.1:$PORT/v1/models > /dev/null 2>&1; then
			
 
				+                echo "🎯 API 响应正常"
			
 
				+            else
			
 
				+                echo "⚠️  API 无响应（服务可能正在启动）"
			
 
				+            fi
			
 
				+        fi
			
 
				+
			
 
				+        # 显示进程内存使用
			
 
				+        if command -v ps >/dev/null 2>&1; then
			
 
				+            MEM=$(ps -o rss= -p $PID 2>/dev/null | awk '{printf "%.2f GB", $1/1024/1024}')
			
 
				+            if [ -n "$MEM" ]; then
			
 
				+                echo "💾 内存使用: $MEM"
			
 
				+            fi
			
 
				+        fi
			
 
				+
			
 
				+        if [ -f $LOGFILE ]; then
			
 
				+            echo "📄 最近日志（最后 3 行）:"
			
 
				+            tail -3 $LOGFILE | sed 's/^/  /'
			
 
				+        fi
			
 
				+    else
			
 
				+        echo "❌ PaddleOCR-VL llama-server 未在运行"
			
 
				+        if [ -f $PIDFILE ]; then
			
 
				+            echo "删除过期的 PID 文件..."
			
 
				+            rm -f $PIDFILE
			
 
				+        fi
			
 
				+    fi
			
 
				+}
			
 
				+
			
 
				+logs() {
			
 
				+    if [ -f $LOGFILE ]; then
			
 
				+        echo "📄 PaddleOCR-VL llama-server 日志:"
			
 
				+        echo "====================="
			
 
				+        tail -f $LOGFILE
			
 
				+    else
			
 
				+        echo "❌ 日志文件不存在: $LOGFILE"
			
 
				+    fi
			
 
				+}
			
 
				+
			
 
				+config() {
			
 
				+    echo "📋 当前配置:"
			
 
				+    echo "  Conda 环境: $CONDA_ENV"
			
 
				+    echo "  Host: $HOST"
			
 
				+    echo "  Port: $PORT"
			
 
				+    echo "  模型别名: $MODEL_NAME"
			
 
				+    echo "  主模型路径: $MODEL_PATH"
			
 
				+    echo "  多模态投影器: $MMPROJ_PATH"
			
 
				+    echo "  上下文长度: $CONTEXT_SIZE"
			
 
				+    echo "  GPU 层数: $GPU_LAYERS"
			
 
				+    echo "  线程数: $THREADS"
			
 
				+    echo "  批处理大小: $BATCH_SIZE"
			
 
				+    echo "  微批处理大小: $UBATCH_SIZE"
			
 
				+    echo "  PID 文件: $PIDFILE"
			
 
				+    echo "  日志文件: $LOGFILE"
			
 
				+
			
 
				+    echo ""
			
 
				+    echo "📦 模型文件检查:"
			
 
				+    if [ -f "$MODEL_PATH" ]; then
			
 
				+        SIZE=$(du -h "$MODEL_PATH" | cut -f1)
			
 
				+        echo "  ✅ 主模型存在 ($SIZE)"
			
 
				+    else
			
 
				+        echo "  ❌ 主模型不存在"
			
 
				+    fi
			
 
				+
			
 
				+    if [ -f "$MMPROJ_PATH" ]; then
			
 
				+        SIZE=$(du -h "$MMPROJ_PATH" | cut -f1)
			
 
				+        echo "  ✅ 多模态投影器存在 ($SIZE)"
			
 
				+    else
			
 
				+        echo "  ❌ 多模态投影器不存在"
			
 
				+    fi
			
 
				+
			
 
				+    echo ""
			
 
				+    echo "🔧 环境检查:"
			
 
				+    echo "  llama-server: $LLAMA_SERVER_EXECUTABLE"
			
 
				+    if [ -x "$LLAMA_SERVER_EXECUTABLE" ]; then
			
 
				+        LLAMA_VERSION=$("$LLAMA_SERVER_EXECUTABLE" --version 2>&1 | head -1 || echo 'Unknown')
			
 
				+        echo "  版本: $LLAMA_VERSION"
			
 
				+    else
			
 
				+        echo "  ⚠️  执行文件不存在或不可执行"
			
 
				+    fi
			
 
				+    echo "  Conda: $(which conda 2>/dev/null || echo '未找到')"
			
 
				+    echo "  当前 Python: $(which python 2>/dev/null || echo '未找到')"
			
 
				+
			
 
				+    echo ""
			
 
				+    echo "💻 系统信息:"
			
 
				+    echo "  架构: $(uname -m)"
			
 
				+    echo "  系统版本: $(sw_vers -productVersion 2>/dev/null || echo 'Unknown')"
			
 
				+    echo "  总内存: $(sysctl -n hw.memsize 2>/dev/null | awk '{printf "%.1f GB", $1/1024/1024/1024}' || echo 'Unknown')"
			
 
				+    echo "  CPU 核心: $(sysctl -n hw.ncpu 2>/dev/null || echo 'Unknown')"
			
 
				+}
			
 
				+
			
 
				+test_api() {
			
 
				+    echo "🧪 测试 PaddleOCR-VL llama-server API..."
			
 
				+
			
 
				+    if [ ! -f $PIDFILE ] || ! kill -0 $(cat $PIDFILE) 2>/dev/null; then
			
 
				+        echo "❌ PaddleOCR-VL llama-server 服务未在运行"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    if ! command -v curl >/dev/null 2>&1; then
			
 
				+        echo "❌ curl 命令未找到"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    echo "📡 测试 /v1/models 端点..."
			
 
				+    response=$(curl -s --connect-timeout 10 http://127.0.0.1:$PORT/v1/models)
			
 
				+    if [ $? -eq 0 ]; then
			
 
				+        echo "✅ Models 端点可访问"
			
 
				+        echo "$response" | python -m json.tool 2>/dev/null || echo "$response"
			
 
				+    else
			
 
				+        echo "❌ Models 端点不可访问"
			
 
				+    fi
			
 
				+
			
 
				+    echo ""
			
 
				+    echo "📡 测试 /health 端点..."
			
 
				+    health=$(curl -s --connect-timeout 5 http://127.0.0.1:$PORT/health)
			
 
				+    if [ $? -eq 0 ]; then
			
 
				+        echo "✅ Health 端点: $health"
			
 
				+    else
			
 
				+        echo "⚠️  Health 端点不可访问"
			
 
				+    fi
			
 
				+}
			
 
				+
			
 
				+test_client() {
			
 
				+    echo "🧪 测试 PaddleOCR-VL 与 llama-server 集成..."
			
 
				+
			
 
				+    if [ ! -f $PIDFILE ] || ! kill -0 $(cat $PIDFILE) 2>/dev/null; then
			
 
				+        echo "❌ PaddleOCR-VL llama-server 服务未在运行，请先启动: $0 start"
			
 
				+        return 1
			
 
				+    fi
			
 
				+
			
 
				+    CONFIG_FILE="/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/config/bank_statement_paddleocr_local.yaml"
			
 
				+    
			
 
				+    echo "📄 配置文件: $CONFIG_FILE"
			
 
				+    echo ""
			
 
				+    echo "确保配置文件中 vl_recognition.api_url 指向: http://localhost:$PORT/v1/chat/completions"
			
 
				+    echo ""
			
 
				+    echo "测试命令示例:"
			
 
				+    echo "  cd /Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser"
			
 
				+    echo "  conda activate mineru"
			
 
				+    echo "  python parse.py --input /path/to/test/image.png --config $CONFIG_FILE --debug"
			
 
				+    echo ""
			
 
				+    echo "或者使用 curl 直接测试 API:"
			
 
				+    echo "  curl -X POST http://localhost:$PORT/v1/chat/completions \\"
			
 
				+    echo "    -H 'Content-Type: application/json' \\"
			
 
				+    echo "    -d '{"
			
 
				+    echo "      \"model\": \"paddleocr-vl\","
			
 
				+    echo "      \"messages\": ["
			
 
				+    echo "        {"
			
 
				+    echo "          \"role\": \"user\","
			
 
				+    echo "          \"content\": ["
			
 
				+    echo "            {\"type\": \"text\", \"text\": \"Table Recognition:\"},"
			
 
				+    echo "            {\"type\": \"image_url\", \"image_url\": {\"url\": \"file:///path/to/image.png\"}}"
			
 
				+    echo "          ]"
			
 
				+    echo "        }"
			
 
				+    echo "      ],"
			
 
				+    echo "      \"max_tokens\": 4096"
			
 
				+    echo "    }'"
			
 
				+}
			
 
				+
			
 
				+usage() {
			
 
				+    echo "PaddleOCR-VL llama-server 服务守护进程（macOS）"
			
 
				+    echo "==========================================="
			
 
				+    echo "用法: $0 {start|stop|restart|status|logs|config|test|test-client}"
			
 
				+    echo ""
			
 
				+    echo "命令:"
			
 
				+    echo "  start       - 启动 PaddleOCR-VL llama-server 服务"
			
 
				+    echo "  stop        - 停止 PaddleOCR-VL llama-server 服务"
			
 
				+    echo "  restart     - 重启 PaddleOCR-VL llama-server 服务"
			
 
				+    echo "  status      - 显示服务状态和资源使用"
			
 
				+    echo "  logs        - 显示服务日志（跟踪模式）"
			
 
				+    echo "  config      - 显示当前配置"
			
 
				+    echo "  test        - 测试 /v1/models API 端点"
			
 
				+    echo "  test-client - 显示如何测试与配置文件集成"
			
 
				+    echo ""
			
 
				+    echo "配置（编辑脚本修改）:"
			
 
				+    echo "  Host: $HOST"
			
 
				+    echo "  Port: $PORT"
			
 
				+    echo "  主模型: $MODEL_PATH"
			
 
				+    echo "  多模态投影器: $MMPROJ_PATH"
			
 
				+    echo "  上下文长度: $CONTEXT_SIZE"
			
 
				+    echo "  GPU 层数: $GPU_LAYERS (Metal)"
			
 
				+    echo ""
			
 
				+    echo "示例:"
			
 
				+    echo "  ./paddleocr_local_daemon.sh start"
			
 
				+    echo "  ./paddleocr_local_daemon.sh status"
			
 
				+    echo "  ./paddleocr_local_daemon.sh logs"
			
 
				+    echo "  ./paddleocr_local_daemon.sh test"
			
 
				+    echo ""
			
 
				+    echo "前置要求:"
			
 
				+    echo "  1. 本机编译 llama.cpp，执行文件: $LLAMA_SERVER_EXECUTABLE"
			
 
				+    echo "  2. 模型文件位于: $HF_CACHE"
			
 
				+    echo "  3. conda 环境 mineru 已配置"
			
 
				+}
			
 
				+
			
 
				+case "$1" in
			
 
				+    start)
			
 
				+        start
			
 
				+        ;;
			
 
				+    stop)
			
 
				+        stop
			
 
				+        ;;
			
 
				+    restart)
			
 
				+        stop
			
 
				+        sleep 3
			
 
				+        start
			
 
				+        ;;
			
 
				+    status)
			
 
				+        status
			
 
				+        ;;
			
 
				+    logs)
			
 
				+        logs
			
 
				+        ;;
			
 
				+    config)
			
 
				+        config
			
 
				+        ;;
			
 
				+    test)
			
 
				+        test_api
			
 
				+        ;;
			
 
				+    test-client)
			
 
				+        test_client
			
 
				+        ;;
			
 
				+    *)
			
 
				+        usage
			
 
				+        exit 1
			
 
				+        ;;
			
 
				+esac
			
--- a/ocr_tools/model_doctor/README.md
+++ b/ocr_tools/model_doctor/README.md
@@ -0,0 +1,93 @@
 
				+# model_doctor —— 模型变更巡检工具
			
 
				+
			
 
				+`ocr_platform` 用到的模型很多、来源各异且在不断升级。本工具用「**清单 → 指纹 → 与基线比对**」
			
 
				+的方式，帮你一眼看出**哪些模型变了 / 缺失了 / 服务不可达 / HF 远端有更新**，避免悄无声息的版本漂移
			
 
				+（例如 daemon 后面把 PaddleOCR-VL 从 1.5 换成 1.6）。
			
 
				+
			
 
				+## 目录结构
			
 
				+
			
 
				+| 文件 | 说明 |
			
 
				+|---|---|
			
 
				+| `model_registry.yaml` | **手工维护**的模型清单，覆盖四类来源 |
			
 
				+| `model_doctor.py` | 巡检 CLI（采集指纹 / 比对 / 报告） |
			
 
				+| `models.lock.json` | 指纹基线（由 `update-lock` 生成，建议纳入 git） |
			
 
				+
			
 
				+## 四类模型来源（kind）
			
 
				+
			
 
				+| kind | 含义 | 指纹内容 | 「变化」如何被发现 |
			
 
				+|---|---|---|---|
			
 
				+| `hf` | HuggingFace 仓库（自动下载，缓存于 `defaults.hf_hub_dir`） | 本地快照 `local_revision`；`--online` 时附 `remote_revision` | 本地 revision 变化；远端 commit 与本地不同（`--online`） |
			
 
				+| `local_file` | 本地单个权重文件或目录 | `size`+`mtime`（`--hash` 加快速 sha256） | 文件被替换、大小/时间变化、缺失 |
			
 
				+| `daemon` | HTTP 服务（llama-server / vllm）+ 关联本地 GGUF 资产 | `/v1/models` 返回的 `served_models` + 各 asset 的 `size`+`mtime` | 服务不可达、声明的模型 id 不符、GGUF 文件被换 |
			
 
				+| `mineru` | MinerU 内置模型 | `package_version` + `model_root` 目录聚合指纹 | MinerU 包升级、内置模型目录变化 |
			
 
				+
			
 
				+## 常用命令
			
 
				+
			
 
				+> 建议在 conda 环境 `mineru` 下运行。
			
 
				+
			
 
				+```bash
			
 
				+cd ocr_tools/model_doctor
			
 
				+
			
 
				+# 列出清单
			
 
				+conda run -n mineru python model_doctor.py list
			
 
				+
			
 
				+# 体检（与基线比对）；有缺失/不可达/远端更新时退出码非 0
			
 
				+conda run -n mineru python model_doctor.py check
			
 
				+
			
 
				+# 体检 + 查 HF 远端最新 commit + 对本地文件算快速 sha256（更敏感、更慢）
			
 
				+conda run -n mineru python model_doctor.py check --online --hash
			
 
				+
			
 
				+# 确认变更合理后，把当前指纹固化为新基线
			
 
				+conda run -n mineru python model_doctor.py update-lock
			
 
				+
			
 
				+# 只打印当前采集到的指纹（不比对，便于排查）
			
 
				+conda run -n mineru python model_doctor.py show
			
 
				+```
			
 
				+
			
 
				+### 选项
			
 
				+
			
 
				+| 选项 | 作用 |
			
 
				+|---|---|
			
 
				+| `--online` | `hf` 条目额外查询远端最新 commit 并比对（需联网） |
			
 
				+| `--hash` | 本地文件/目录额外计算快速 sha256（头 8MB + 尾 8MB + size） |
			
 
				+| `--strict` | `check` 时把「指纹变化」也算失败（默认仅缺失/不可达/远端更新/新增才非 0 退出） |
			
 
				+| `--registry` / `--lock` | 指定清单 / 基线文件路径 |
			
 
				+
			
 
				+## 报告符号
			
 
				+
			
 
				+| 符号 | 含义 |
			
 
				+|---|---|
			
 
				+| ✅ | 未变化 |
			
 
				+| ⚠️ | 指纹变化（列出具体字段 diff） |
			
 
				+| 🔺 | HF 远端有更新（`--online`） |
			
 
				+| ❌ | 缺失 / 服务不可达 |
			
 
				+| 🆕 | 新增条目（基线中无记录，需 `update-lock`） |
			
 
				+| 🗑 | 基线中存在但 registry 已移除 |
			
 
				+| ·  | 跳过（`enabled: false`） |
			
 
				+
			
 
				+## 典型工作流
			
 
				+
			
 
				+1. 新增/升级模型 → 编辑 `model_registry.yaml`（增删条目或改路径/repo）。
			
 
				+2. `check` 看差异是否符合预期。
			
 
				+3. 确认无误 → `update-lock` 更新基线，并把 `models.lock.json` 一起提交。
			
 
				+4. 日常/CI/定时任务里跑 `check`；非 0 退出即代表「有人动了模型，需要关注」。
			
 
				+
			
 
				+可挂载的触发点：
			
 
				+- 流程启动前 `check`（缺失直接拦截，避免跑到一半报错）；
			
 
				+- `launchd`/`cron` 定时 `check --online` + 通知，监控 HF 远端更新；
			
 
				+- git `pre-commit`：改了 config 的 `model_dir` 时校验本地是否已下载。
			
 
				+
			
 
				+## 指纹策略说明
			
 
				+
			
 
				+- **大文件**默认只用 `size`+`mtime`（快、可离线）；`--hash` 时用「头尾各 8MB + size」的快速 sha256，
			
 
				+  兼顾敏感度与速度，不全量读取 GB 级 GGUF。
			
 
				+- **HF revision**：读取 HF 缓存 `models--{org}--{name}/refs/main`（即本地 commit）；
			
 
				+  `--online` 用 `huggingface_hub.HfApi().model_info(repo_id).sha` 取远端最新 commit 比对，无需下载。
			
 
				+- **daemon**：`/v1/models` 仅反映「服务声明的 model id」；真实权重变化靠 `assets` 的本地 GGUF 指纹兜底。
			
 
				+
			
 
				+## 维护提示
			
 
				+
			
 
				+- 内网/未启动的服务、可选模型已设为 `enabled: false`，体检时跳过、不报红；需要时改为 `true`。
			
 
				+- `mineru-builtin` 的 `model_root` 当前指向 `modelscope_cache`，若你的 MinerU 内置模型实际下载在
			
 
				+  `hf_home` 或别处，请按实修改该路径（留空 `null` 则只校验包版本）。
			
 
				+- `models.lock.json` 是「期望状态」，请在确认变更合理后才 `update-lock`，并随代码提交，便于团队对齐。
			
--- a/ocr_tools/model_doctor/model_doctor.py
+++ b/ocr_tools/model_doctor/model_doctor.py
@@ -0,0 +1,433 @@
 
				+#!/usr/bin/env python
			
 
				+"""model_doctor —— 模型变更巡检工具（方案 B）。
			
 
				+
			
 
				+依据手工维护的 model_registry.yaml，对四类模型来源采集「指纹」，
			
 
				+与基线 models.lock.json 比对，报告哪些模型发生了变化 / 缺失 / 服务不可达，
			
 
				+以及（可选）HF 远端是否有更新。
			
 
				+
			
 
				+子命令：
			
 
				+    list         列出 registry 中的模型条目
			
 
				+    show         采集并打印当前指纹（不比对）
			
 
				+    check        采集并与 lock 基线比对，输出报告（有变更/缺失则退出码非 0）
			
 
				+    update-lock  采集并把当前指纹固化为新的 lock 基线
			
 
				+
			
 
				+用法（建议在 conda 环境 mineru 下）：
			
 
				+    conda run -n mineru python model_doctor.py check
			
 
				+    conda run -n mineru python model_doctor.py check --online --hash
			
 
				+    conda run -n mineru python model_doctor.py update-lock
			
 
				+"""
			
 
				+
			
 
				+from __future__ import annotations
			
 
				+
			
 
				+import argparse
			
 
				+import hashlib
			
 
				+import json
			
 
				+import os
			
 
				+import sys
			
 
				+import urllib.request
			
 
				+from datetime import datetime
			
 
				+from pathlib import Path
			
 
				+
			
 
				+try:
			
 
				+    import yaml
			
 
				+except ImportError:
			
 
				+    sys.stderr.write("缺少依赖 PyYAML，请在 mineru 环境安装：conda run -n mineru pip install pyyaml\n")
			
 
				+    raise
			
 
				+
			
 
				+HERE = Path(__file__).resolve().parent
			
 
				+DEFAULT_REGISTRY = HERE / "model_registry.yaml"
			
 
				+DEFAULT_LOCK = HERE / "models.lock.json"
			
 
				+
			
 
				+# 报告状态符号
			
 
				+SYM = {
			
 
				+    "ok": "✅",
			
 
				+    "changed": "⚠️ ",
			
 
				+    "remote_update": "🔺",
			
 
				+    "missing": "❌",
			
 
				+    "unreachable": "❌",
			
 
				+    "new": "🆕",
			
 
				+    "removed": "🗑 ",
			
 
				+    "skipped": "· ",
			
 
				+}
			
 
				+
			
 
				+# 目录指纹遍历上限，避免误指向超大目录卡死
			
 
				+_MAX_DIR_FILES = 20000
			
 
				+
			
 
				+
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+# 通用指纹工具
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+def _fast_sha256(path: Path, head_tail_mb: int = 8) -> str:
			
 
				+    """对大文件取「头 + 尾 + 大小」的快速 sha256，避免全量读取。"""
			
 
				+    size = path.stat().st_size
			
 
				+    chunk = head_tail_mb * 1024 * 1024
			
 
				+    h = hashlib.sha256()
			
 
				+    h.update(str(size).encode())
			
 
				+    with path.open("rb") as f:
			
 
				+        h.update(f.read(chunk))
			
 
				+        if size > chunk * 2:
			
 
				+            f.seek(-chunk, os.SEEK_END)
			
 
				+            h.update(f.read(chunk))
			
 
				+    return h.hexdigest()
			
 
				+
			
 
				+
			
 
				+def _file_fp(path: Path, do_hash: bool) -> dict:
			
 
				+    if not path.exists():
			
 
				+        return {"exists": False}
			
 
				+    st = path.stat()
			
 
				+    fp = {
			
 
				+        "exists": True,
			
 
				+        "size": st.st_size,
			
 
				+        "mtime": int(st.st_mtime),
			
 
				+    }
			
 
				+    if do_hash and path.is_file():
			
 
				+        fp["sha256_fast"] = _fast_sha256(path)
			
 
				+    return fp
			
 
				+
			
 
				+
			
 
				+def _dir_fp(path: Path, do_hash: bool) -> dict:
			
 
				+    """目录指纹：聚合 (相对路径, size, mtime) 排序后的 sha256。"""
			
 
				+    if not path.exists():
			
 
				+        return {"exists": False}
			
 
				+    files = []
			
 
				+    count = 0
			
 
				+    for p in sorted(path.rglob("*")):
			
 
				+        if p.is_file():
			
 
				+            count += 1
			
 
				+            if count > _MAX_DIR_FILES:
			
 
				+                files.append(("<truncated>", -1, -1))
			
 
				+                break
			
 
				+            st = p.stat()
			
 
				+            files.append((str(p.relative_to(path)), st.st_size, int(st.st_mtime)))
			
 
				+    h = hashlib.sha256()
			
 
				+    total_size = 0
			
 
				+    for rel, size, mtime in files:
			
 
				+        h.update(f"{rel}|{size}|{mtime}\n".encode())
			
 
				+        if size > 0:
			
 
				+            total_size += size
			
 
				+    fp = {
			
 
				+        "exists": True,
			
 
				+        "file_count": len([f for f in files if f[1] >= 0]),
			
 
				+        "total_size": total_size,
			
 
				+        "tree_sha256": h.hexdigest(),
			
 
				+    }
			
 
				+    return fp
			
 
				+
			
 
				+
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+# 各 kind 的指纹采集
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+def fp_local_file(entry: dict, defaults: dict) -> dict:
			
 
				+    do_hash = entry.get("hash", defaults.get("hash", False))
			
 
				+    path = Path(os.path.expanduser(entry["path"]))
			
 
				+    fp = _dir_fp(path, do_hash) if path.is_dir() else _file_fp(path, do_hash)
			
 
				+    status = "ok" if fp.get("exists") else "missing"
			
 
				+    return {"status": status, "fingerprint": fp}
			
 
				+
			
 
				+
			
 
				+def fp_hf(entry: dict, defaults: dict) -> dict:
			
 
				+    repo_id = entry["repo_id"]
			
 
				+    hub_dir = Path(os.path.expanduser(entry.get("cache_dir", defaults["hf_hub_dir"])))
			
 
				+    repo_dir = hub_dir / ("models--" + repo_id.replace("/", "--"))
			
 
				+    fp: dict = {"repo_id": repo_id}
			
 
				+    local_rev = None
			
 
				+    if repo_dir.exists():
			
 
				+        refs_main = repo_dir / "refs" / "main"
			
 
				+        if refs_main.exists():
			
 
				+            local_rev = refs_main.read_text().strip()
			
 
				+        else:
			
 
				+            snaps = repo_dir / "snapshots"
			
 
				+            if snaps.exists():
			
 
				+                cand = sorted([d.name for d in snaps.iterdir() if d.is_dir()])
			
 
				+                local_rev = cand[-1] if cand else None
			
 
				+        fp["local_revision"] = local_rev
			
 
				+        fp["cached"] = True
			
 
				+    else:
			
 
				+        fp["cached"] = False
			
 
				+
			
 
				+    status = "ok" if fp.get("cached") else "missing"
			
 
				+
			
 
				+    # 可选：查远端最新 commit
			
 
				+    if entry.get("online", defaults.get("online", False)):
			
 
				+        try:
			
 
				+            from huggingface_hub import HfApi
			
 
				+
			
 
				+            remote_sha = HfApi().model_info(repo_id).sha
			
 
				+            fp["remote_revision"] = remote_sha
			
 
				+            if local_rev and remote_sha and local_rev != remote_sha:
			
 
				+                status = "remote_update"
			
 
				+        except Exception as e:  # 网络不可达等
			
 
				+            fp["remote_error"] = str(e)
			
 
				+    return {"status": status, "fingerprint": fp}
			
 
				+
			
 
				+
			
 
				+def fp_daemon(entry: dict, defaults: dict) -> dict:
			
 
				+    do_hash = entry.get("hash", defaults.get("hash", False))
			
 
				+    timeout = entry.get("daemon_timeout", defaults.get("daemon_timeout", 3))
			
 
				+    url = entry["server_url"].rstrip("/") + "/v1/models"
			
 
				+    fp: dict = {"server_url": entry["server_url"]}
			
 
				+    status = "ok"
			
 
				+
			
 
				+    try:
			
 
				+        req = urllib.request.Request(url, headers={"Accept": "application/json"})
			
 
				+        with urllib.request.urlopen(req, timeout=timeout) as resp:
			
 
				+            data = json.loads(resp.read().decode())
			
 
				+        served = [m.get("id") for m in data.get("data", [])]
			
 
				+        fp["reachable"] = True
			
 
				+        fp["served_models"] = served
			
 
				+        expect = entry.get("served_model")
			
 
				+        if expect and expect not in served:
			
 
				+            fp["served_mismatch"] = {"expect": expect, "actual": served}
			
 
				+            status = "changed"
			
 
				+    except Exception as e:
			
 
				+        fp["reachable"] = False
			
 
				+        fp["error"] = str(e)
			
 
				+        status = "unreachable"
			
 
				+
			
 
				+    # 本地 GGUF 资产指纹（即使服务不可达也采集，便于发现文件被换）
			
 
				+    assets = entry.get("assets") or []
			
 
				+    if assets:
			
 
				+        fp["assets"] = {}
			
 
				+        for a in assets:
			
 
				+            p = Path(os.path.expanduser(a))
			
 
				+            afp = _file_fp(p, do_hash)
			
 
				+            fp["assets"][a] = afp
			
 
				+            if not afp.get("exists"):
			
 
				+                status = "missing" if status == "ok" else status
			
 
				+    return {"status": status, "fingerprint": fp}
			
 
				+
			
 
				+
			
 
				+def fp_mineru(entry: dict, defaults: dict) -> dict:
			
 
				+    do_hash = entry.get("hash", defaults.get("hash", False))
			
 
				+    pkg = entry.get("package", "mineru")
			
 
				+    fp: dict = {"package": pkg}
			
 
				+    status = "ok"
			
 
				+    try:
			
 
				+        import importlib.metadata as md
			
 
				+
			
 
				+        fp["package_version"] = md.version(pkg)
			
 
				+    except Exception as e:
			
 
				+        fp["package_error"] = str(e)
			
 
				+        status = "missing"
			
 
				+
			
 
				+    root = entry.get("model_root")
			
 
				+    if root:
			
 
				+        p = Path(os.path.expanduser(root))
			
 
				+        rfp = _dir_fp(p, do_hash)
			
 
				+        fp["model_root"] = str(p)
			
 
				+        fp["model_root_fp"] = rfp
			
 
				+        if not rfp.get("exists"):
			
 
				+            status = "missing" if status == "ok" else status
			
 
				+    return {"status": status, "fingerprint": fp}
			
 
				+
			
 
				+
			
 
				+_COLLECTORS = {
			
 
				+    "local_file": fp_local_file,
			
 
				+    "hf": fp_hf,
			
 
				+    "daemon": fp_daemon,
			
 
				+    "mineru": fp_mineru,
			
 
				+}
			
 
				+
			
 
				+
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+# registry / lock 读写与比对
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+def load_registry(path: Path) -> dict:
			
 
				+    with path.open("r", encoding="utf-8") as f:
			
 
				+        reg = yaml.safe_load(f)
			
 
				+    reg.setdefault("defaults", {})
			
 
				+    reg.setdefault("models", [])
			
 
				+    return reg
			
 
				+
			
 
				+
			
 
				+def collect(reg: dict, online: bool, do_hash: bool) -> dict:
			
 
				+    defaults = dict(reg.get("defaults", {}))
			
 
				+    if online:
			
 
				+        defaults["online"] = True
			
 
				+    if do_hash:
			
 
				+        defaults["hash"] = True
			
 
				+
			
 
				+    snapshot = {}
			
 
				+    for entry in reg.get("models", []):
			
 
				+        name = entry["name"]
			
 
				+        if not entry.get("enabled", True):
			
 
				+            snapshot[name] = {"kind": entry.get("kind"), "status": "skipped", "fingerprint": {}}
			
 
				+            continue
			
 
				+        kind = entry.get("kind")
			
 
				+        collector = _COLLECTORS.get(kind)
			
 
				+        if collector is None:
			
 
				+            snapshot[name] = {"kind": kind, "status": "missing",
			
 
				+                              "fingerprint": {"error": f"未知 kind: {kind}"}}
			
 
				+            continue
			
 
				+        try:
			
 
				+            result = collector(entry, defaults)
			
 
				+        except Exception as e:
			
 
				+            result = {"status": "missing", "fingerprint": {"error": str(e)}}
			
 
				+        result["kind"] = kind
			
 
				+        result["used_by"] = entry.get("used_by", [])
			
 
				+        snapshot[name] = result
			
 
				+    return snapshot
			
 
				+
			
 
				+
			
 
				+def load_lock(path: Path) -> dict:
			
 
				+    if not path.exists():
			
 
				+        return {}
			
 
				+    with path.open("r", encoding="utf-8") as f:
			
 
				+        return json.load(f).get("models", {})
			
 
				+
			
 
				+
			
 
				+def save_lock(path: Path, snapshot: dict) -> None:
			
 
				+    payload = {
			
 
				+        "generated_at": datetime.now().astimezone().isoformat(timespec="seconds"),
			
 
				+        "models": snapshot,
			
 
				+    }
			
 
				+    with path.open("w", encoding="utf-8") as f:
			
 
				+        json.dump(payload, f, ensure_ascii=False, indent=2)
			
 
				+
			
 
				+
			
 
				+def diff_fp(old: dict, new: dict) -> list:
			
 
				+    """返回发生变化的字段路径（浅层 + 一层嵌套）。"""
			
 
				+    changes = []
			
 
				+    keys = set(old.keys()) | set(new.keys())
			
 
				+    for k in sorted(keys):
			
 
				+        ov, nv = old.get(k), new.get(k)
			
 
				+        if isinstance(ov, dict) and isinstance(nv, dict):
			
 
				+            for sk in sorted(set(ov) | set(nv)):
			
 
				+                if ov.get(sk) != nv.get(sk):
			
 
				+                    changes.append(f"{k}.{sk}: {ov.get(sk)} → {nv.get(sk)}")
			
 
				+        elif ov != nv:
			
 
				+            changes.append(f"{k}: {ov} → {nv}")
			
 
				+    return changes
			
 
				+
			
 
				+
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+# 子命令
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+def cmd_list(reg: dict) -> int:
			
 
				+    print(f"模型清单（{len(reg.get('models', []))} 条）：\n")
			
 
				+    for e in reg.get("models", []):
			
 
				+        flag = "  " if e.get("enabled", True) else "× "
			
 
				+        used = "；".join(e.get("used_by", []))
			
 
				+        print(f"{flag}[{e.get('kind'):<10}] {e['name']}")
			
 
				+        if used:
			
 
				+            print(f"                用于：{used}")
			
 
				+    print("\n（× 表示 enabled: false，体检时跳过）")
			
 
				+    return 0
			
 
				+
			
 
				+
			
 
				+def cmd_show(reg: dict, online: bool, do_hash: bool) -> int:
			
 
				+    snap = collect(reg, online, do_hash)
			
 
				+    print(json.dumps(snap, ensure_ascii=False, indent=2))
			
 
				+    return 0
			
 
				+
			
 
				+
			
 
				+def cmd_check(reg: dict, lock_path: Path, online: bool, do_hash: bool, strict: bool) -> int:
			
 
				+    new = collect(reg, online, do_hash)
			
 
				+    old = load_lock(lock_path)
			
 
				+
			
 
				+    has_baseline = bool(old)
			
 
				+    problems = 0      # missing / unreachable / remote_update
			
 
				+    changes = 0       # 指纹变化
			
 
				+    news = 0          # 新增（lock 中无记录）
			
 
				+
			
 
				+    print(f"模型体检报告  baseline={'有' if has_baseline else '无（首次，请先 update-lock）'}\n")
			
 
				+
			
 
				+    for name, cur in new.items():
			
 
				+        kind = cur.get("kind")
			
 
				+        status = cur.get("status")
			
 
				+
			
 
				+        if status == "skipped":
			
 
				+            print(f"{SYM['skipped']} {name:<28} [{kind}] 跳过（disabled）")
			
 
				+            continue
			
 
				+
			
 
				+        prev = old.get(name)
			
 
				+        if prev is None:
			
 
				+            news += 1
			
 
				+            print(f"{SYM['new']} {name:<28} [{kind}] 新增条目（基线中无记录）")
			
 
				+            continue
			
 
				+
			
 
				+        # 状态类问题优先
			
 
				+        if status in ("missing", "unreachable"):
			
 
				+            problems += 1
			
 
				+            detail = cur["fingerprint"].get("error", "")
			
 
				+            print(f"{SYM[status]} {name:<28} [{kind}] {status} {detail}")
			
 
				+            continue
			
 
				+        if status == "remote_update":
			
 
				+            problems += 1
			
 
				+            fpr = cur["fingerprint"]
			
 
				+            print(f"{SYM['remote_update']} {name:<28} [{kind}] HF 远端有更新："
			
 
				+                  f"{fpr.get('local_revision')} → {fpr.get('remote_revision')}")
			
 
				+            continue
			
 
				+
			
 
				+        # 指纹比对
			
 
				+        fp_changes = diff_fp(prev.get("fingerprint", {}), cur.get("fingerprint", {}))
			
 
				+        if fp_changes:
			
 
				+            changes += 1
			
 
				+            print(f"{SYM['changed']} {name:<28} [{kind}] 指纹变化：")
			
 
				+            for c in fp_changes:
			
 
				+                print(f"        - {c}")
			
 
				+        else:
			
 
				+            print(f"{SYM['ok']} {name:<28} [{kind}] 未变化")
			
 
				+
			
 
				+    # 基线中存在但 registry 已删除
			
 
				+    removed = [n for n in old if n not in new]
			
 
				+    for n in removed:
			
 
				+        print(f"{SYM['removed']} {n:<28} 基线中存在但 registry 已移除")
			
 
				+
			
 
				+    print("\n" + "-" * 60)
			
 
				+    print(f"问题(缺失/不可达/远端更新)={problems}  指纹变化={changes}  "
			
 
				+          f"新增={news}  移除={len(removed)}")
			
 
				+
			
 
				+    if not has_baseline:
			
 
				+        print("提示：尚无基线，运行 `update-lock` 生成。")
			
 
				+        return 1
			
 
				+
			
 
				+    fail = problems + (changes if strict else 0) + news
			
 
				+    if problems and not strict:
			
 
				+        # 即使非 strict，缺失/不可达也应非 0 退出
			
 
				+        return 1
			
 
				+    return 1 if fail else 0
			
 
				+
			
 
				+
			
 
				+def cmd_update_lock(reg: dict, lock_path: Path, online: bool, do_hash: bool) -> int:
			
 
				+    snap = collect(reg, online, do_hash)
			
 
				+    save_lock(lock_path, snap)
			
 
				+    enabled = [n for n, v in snap.items() if v.get("status") != "skipped"]
			
 
				+    print(f"已写入基线 {lock_path}（{len(enabled)} 条生效，"
			
 
				+          f"{len(snap) - len(enabled)} 条跳过）")
			
 
				+    return 0
			
 
				+
			
 
				+
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+# 入口
			
 
				+# --------------------------------------------------------------------------- #
			
 
				+def main(argv=None) -> int:
			
 
				+    parser = argparse.ArgumentParser(description="模型变更巡检工具 model_doctor")
			
 
				+    parser.add_argument("command", choices=["list", "show", "check", "update-lock"])
			
 
				+    parser.add_argument("--registry", type=Path, default=DEFAULT_REGISTRY,
			
 
				+                        help=f"清单文件，默认 {DEFAULT_REGISTRY.name}")
			
 
				+    parser.add_argument("--lock", type=Path, default=DEFAULT_LOCK,
			
 
				+                        help=f"基线文件，默认 {DEFAULT_LOCK.name}")
			
 
				+    parser.add_argument("--online", action="store_true",
			
 
				+                        help="hf 条目额外查询远端最新 commit 进行比对")
			
 
				+    parser.add_argument("--hash", dest="do_hash", action="store_true",
			
 
				+                        help="本地文件/目录额外计算快速 sha256（更敏感但更慢）")
			
 
				+    parser.add_argument("--strict", action="store_true",
			
 
				+                        help="check 时把『指纹变化』也视为失败（非 0 退出）")
			
 
				+    args = parser.parse_args(argv)
			
 
				+
			
 
				+    reg = load_registry(args.registry)
			
 
				+
			
 
				+    if args.command == "list":
			
 
				+        return cmd_list(reg)
			
 
				+    if args.command == "show":
			
 
				+        return cmd_show(reg, args.online, args.do_hash)
			
 
				+    if args.command == "check":
			
 
				+        return cmd_check(reg, args.lock, args.online, args.do_hash, args.strict)
			
 
				+    if args.command == "update-lock":
			
 
				+        return cmd_update_lock(reg, args.lock, args.online, args.do_hash)
			
 
				+    return 2
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    sys.exit(main())
			
--- a/ocr_tools/model_doctor/model_registry.yaml
+++ b/ocr_tools/model_doctor/model_registry.yaml
@@ -0,0 +1,134 @@
 
				+# ============================================================
			
 
				+# model_doctor 模型清单（手工维护）
			
 
				+# ------------------------------------------------------------
			
 
				+# 覆盖四类模型来源：
			
 
				+#   hf          —— HuggingFace 仓库（自动下载，缓存在 defaults.hf_hub_dir）
			
 
				+#   local_file  —— 本地单个权重文件（.onnx/.gguf/.pth/...）或目录
			
 
				+#   daemon      —— 通过 HTTP 服务访问的模型（llama-server / vllm），可附带本地 GGUF 资产
			
 
				+#   mineru      —— MinerU 内置模型（校验包版本 + 可选模型根目录指纹）
			
 
				+#
			
 
				+# 指纹策略：
			
 
				+#   本地文件/目录默认用 size + mtime（快、可离线）；加 --hash 才算快速 sha256。
			
 
				+#   hf 默认只读本地快照 revision；加 --online 才查远端最新 commit 比对。
			
 
				+#   daemon 默认探测 /v1/models 是否包含 served_model + 本地 assets 指纹。
			
 
				+#
			
 
				+# 维护说明：新增/升级模型时在此增删条目，再运行 `model_doctor.py update-lock`
			
 
				+# 固化基线；日常用 `model_doctor.py check` 体检。
			
 
				+# ============================================================
			
 
				+
			
 
				+defaults:
			
 
				+  hf_hub_dir: "/Users/zhch158/models/hf_home/hub"  # HF 缓存 hub 根（= $HF_HOME/hub）
			
 
				+  hash: false        # 本地文件默认仅 size+mtime；true 则计算快速 sha256
			
 
				+  online: false      # hf 远端比对默认关闭（内网/离线场景）
			
 
				+  daemon_timeout: 3  # daemon 探测超时（秒）
			
 
				+
			
 
				+models:
			
 
				+  # ===== ① HF 仓库（自动下载，缓存在 hf_hub_dir） =====
			
 
				+  - name: docling-layout-old
			
 
				+    kind: hf
			
 
				+    repo_id: ds4sd/docling-layout-old
			
 
				+    used_by: ["layout/docling（bank_statement_* 默认布局）"]
			
 
				+    enabled: true
			
 
				+
			
 
				+  - name: pp-doclayoutv3
			
 
				+    kind: hf
			
 
				+    repo_id: PaddlePaddle/PP-DocLayoutV3_safetensors
			
 
				+    used_by: ["layout/paddle", "seal_supplement"]
			
 
				+    enabled: true
			
 
				+
			
 
				+  - name: paddleocr-vl-1.6-hf
			
 
				+    kind: hf
			
 
				+    repo_id: PaddlePaddle/PaddleOCR-VL-1.6
			
 
				+    used_by: ["PaddleOCR-VL transformers 权重（GGUF 转换源）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: mineru-2.5-pro-2604-1.2b-hf
			
 
				+    kind: hf
			
 
				+    repo_id: opendatalab/MinerU2.5-Pro-2604-1.2B
			
 
				+    used_by: ["MinerU2.5-Pro-2604-1.2B transformers 权重（GGUF 转换源）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: glm-ocr-hf
			
 
				+    kind: hf
			
 
				+    repo_id: zai-org/GLM-OCR
			
 
				+    used_by: ["GLM-OCR transformers 权重（GGUF 转换源）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: rtdetr-wired-cell-hf
			
 
				+    kind: hf
			
 
				+    repo_id: PaddlePaddle/RT-DETR-L_wired_table_cell_det
			
 
				+    used_by: ["table_recognition_wired/cell_fusion paddle格式 pdiparams"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: rdetr-h-layout-17cls-hf
			
 
				+    kind: hf
			
 
				+    repo_id: PaddlePaddle/RT-DETR-H_layout_17cls
			
 
				+    used_by: ["layout/paddle paddle格式 pdiparams"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  # ===== ② 本地权重文件 =====
			
 
				+  - name: rtdetr-wired-cell
			
 
				+    kind: local_file
			
 
				+    path: /Users/zhch158/models/pytorch_models/Table/RT-DETR-L_wired_table_cell_det.onnx
			
 
				+    used_by: ["table_recognition_wired/cell_fusion（有线表格单元格融合）"]
			
 
				+    enabled: true
			
 
				+
			
 
				+  - name: rtdetr-h-layout-17cls
			
 
				+    kind: local_file
			
 
				+    path: /Users/zhch158/models/pytorch_models/Layout/RT-DETR-H_layout_17cls.onnx
			
 
				+    used_by: ["layout/paddle（可选，默认走 HF 路径）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  # ===== ③ daemon 服务（HTTP + 关联本地 GGUF 资产） =====
			
 
				+  - name: paddleocr-vl-1.6-daemon
			
 
				+    kind: daemon
			
 
				+    server_url: http://localhost:8102
			
 
				+    served_model: PaddleOCR-VL-1.6      # 期望 /v1/models 返回包含此 id
			
 
				+    assets:
			
 
				+      - /Users/zhch158/models/PaddleOCR-VL-1.6-GGUF/PaddleOCR-VL-1.6-F16.gguf
			
 
				+      - /Users/zhch158/models/PaddleOCR-VL-1.6-GGUF/PaddleOCR-VL-1.6-F16-mmproj.gguf
			
 
				+    used_by: ["vl_recognition/paddle（bank_statement_paddle_vl_local）"]
			
 
				+    enabled: true
			
 
				+
			
 
				+  - name: glm-ocr-daemon-local
			
 
				+    kind: daemon
			
 
				+    server_url: http://localhost:8101
			
 
				+    served_model: glm-ocr
			
 
				+    assets:
			
 
				+      - /Users/zhch158/models/hf_home/hub/models--ggml-org--GLM-OCR-GGUF/snapshots/65a42de1148dbed2297e922b5dbc7d9b70c36578/GLM-OCR-Q8_0.gguf
			
 
				+      - /Users/zhch158/models/hf_home/hub/models--ggml-org--GLM-OCR-GGUF/snapshots/65a42de1148dbed2297e922b5dbc7d9b70c36578/mmproj-GLM-OCR-Q8_0.gguf
			
 
				+    used_by: ["vl_recognition/glmocr（本地）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: mineru-2.5-pro-daemon-local
			
 
				+    kind: daemon
			
 
				+    server_url: http://localhost:8103
			
 
				+    served_model: MinerU2.5-Pro-2604-1.2B
			
 
				+    assets:
			
 
				+      - /Users/zhch158/models/hf_home/hub/models--mradermacher--MinerU2.5-Pro-2604-1.2B-GGUF/snapshots/70429e9c728b6a5e904f358a9936c17bd3f5f4b8/MinerU2.5-Pro-2604-1.2B.Q8_0.gguf
			
 
				+    used_by: ["MinerU2.5 本地 VLM"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: mineru-vl-remote
			
 
				+    kind: daemon
			
 
				+    server_url: http://10.192.72.11:20006
			
 
				+    served_model: MinerU2.5
			
 
				+    used_by: ["vl_recognition/mineru_vl（远程 vllm）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  - name: paddleocr-vl-remote
			
 
				+    kind: daemon
			
 
				+    server_url: http://10.192.72.11:20016
			
 
				+    served_model: PaddleOCR-VL-0.9B
			
 
				+    used_by: ["vl_recognition/paddle（远程 vllm）"]
			
 
				+    enabled: false
			
 
				+
			
 
				+  # ===== ④ MinerU 内置模型（包版本 + 模型根目录指纹） =====
			
 
				+  - name: mineru-builtin
			
 
				+    kind: mineru
			
 
				+    package: mineru
			
 
				+    # MinerU pipeline 内置模型（layout/ocr/formula/table 等）下载根目录；
			
 
				+    # 留空（null）则仅校验包版本。常见位置：modelscope_cache 或 hf_home。
			
 
				+    model_root: /Users/zhch158/models/modelscope_cache
			
 
				+    used_by: ["preprocessor/mineru", "ocr_recognition/mineru", "table_classification/paddle"]
			
 
				+    enabled: true
			
--- a/ocr_tools/model_doctor/models.lock.json
+++ b/ocr_tools/model_doctor/models.lock.json
@@ -0,0 +1,125 @@
 
				+{
			
 
				+  "generated_at": "2026-05-29T16:06:36+08:00",
			
 
				+  "models": {
			
 
				+    "docling-layout-old": {
			
 
				+      "status": "ok",
			
 
				+      "fingerprint": {
			
 
				+        "repo_id": "ds4sd/docling-layout-old",
			
 
				+        "local_revision": "b5b4bd59ad2b69aab715e9b1f1dfd74394c45fd4",
			
 
				+        "cached": true
			
 
				+      },
			
 
				+      "kind": "hf",
			
 
				+      "used_by": [
			
 
				+        "layout/docling（bank_statement_* 默认布局）"
			
 
				+      ]
			
 
				+    },
			
 
				+    "pp-doclayoutv3": {
			
 
				+      "status": "ok",
			
 
				+      "fingerprint": {
			
 
				+        "repo_id": "PaddlePaddle/PP-DocLayoutV3_safetensors",
			
 
				+        "local_revision": "fc37bdafb4cb98df1750ad8d2e21e2655189b171",
			
 
				+        "cached": true
			
 
				+      },
			
 
				+      "kind": "hf",
			
 
				+      "used_by": [
			
 
				+        "layout/paddle",
			
 
				+        "seal_supplement"
			
 
				+      ]
			
 
				+    },
			
 
				+    "paddleocr-vl-1.6-hf": {
			
 
				+      "status": "ok",
			
 
				+      "fingerprint": {
			
 
				+        "repo_id": "PaddlePaddle/PaddleOCR-VL-1.6",
			
 
				+        "local_revision": "bd1f9d64f127560f3fa49e69292486a5993782c6",
			
 
				+        "cached": true
			
 
				+      },
			
 
				+      "kind": "hf",
			
 
				+      "used_by": [
			
 
				+        "PaddleOCR-VL transformers 权重（GGUF 转换源）"
			
 
				+      ]
			
 
				+    },
			
 
				+    "rtdetr-wired-cell": {
			
 
				+      "status": "ok",
			
 
				+      "fingerprint": {
			
 
				+        "exists": true,
			
 
				+        "size": 129353461,
			
 
				+        "mtime": 1769605421
			
 
				+      },
			
 
				+      "kind": "local_file",
			
 
				+      "used_by": [
			
 
				+        "table_recognition_wired/cell_fusion（有线表格单元格融合）"
			
 
				+      ]
			
 
				+    },
			
 
				+    "rtdetr-h-layout-17cls": {
			
 
				+      "kind": "local_file",
			
 
				+      "status": "skipped",
			
 
				+      "fingerprint": {}
			
 
				+    },
			
 
				+    "paddleocr-vl-1.6-daemon": {
			
 
				+      "status": "ok",
			
 
				+      "fingerprint": {
			
 
				+        "server_url": "http://localhost:8102",
			
 
				+        "reachable": true,
			
 
				+        "served_models": [
			
 
				+          "PaddleOCR-VL-1.6"
			
 
				+        ],
			
 
				+        "assets": {
			
 
				+          "/Users/zhch158/models/PaddleOCR-VL-1.6-GGUF/PaddleOCR-VL-1.6-F16.gguf": {
			
 
				+            "exists": true,
			
 
				+            "size": 935768704,
			
 
				+            "mtime": 1780029330
			
 
				+          },
			
 
				+          "/Users/zhch158/models/PaddleOCR-VL-1.6-GGUF/PaddleOCR-VL-1.6-F16-mmproj.gguf": {
			
 
				+            "exists": true,
			
 
				+            "size": 880415808,
			
 
				+            "mtime": 1780029364
			
 
				+          }
			
 
				+        }
			
 
				+      },
			
 
				+      "kind": "daemon",
			
 
				+      "used_by": [
			
 
				+        "vl_recognition/paddle（bank_statement_paddle_vl_local）"
			
 
				+      ]
			
 
				+    },
			
 
				+    "glm-ocr-daemon-local": {
			
 
				+      "kind": "daemon",
			
 
				+      "status": "skipped",
			
 
				+      "fingerprint": {}
			
 
				+    },
			
 
				+    "mineru-2.5-pro-daemon-local": {
			
 
				+      "kind": "daemon",
			
 
				+      "status": "skipped",
			
 
				+      "fingerprint": {}
			
 
				+    },
			
 
				+    "mineru-vl-remote": {
			
 
				+      "kind": "daemon",
			
 
				+      "status": "skipped",
			
 
				+      "fingerprint": {}
			
 
				+    },
			
 
				+    "paddleocr-vl-remote": {
			
 
				+      "kind": "daemon",
			
 
				+      "status": "skipped",
			
 
				+      "fingerprint": {}
			
 
				+    },
			
 
				+    "mineru-builtin": {
			
 
				+      "status": "ok",
			
 
				+      "fingerprint": {
			
 
				+        "package": "mineru",
			
 
				+        "package_version": "3.1.13",
			
 
				+        "model_root": "/Users/zhch158/models/modelscope_cache",
			
 
				+        "model_root_fp": {
			
 
				+          "exists": true,
			
 
				+          "file_count": 10,
			
 
				+          "total_size": 122467480,
			
 
				+          "tree_sha256": "7d579059c6b11dca03920f2fa65553ac03fb37dc8b9075a81acb65bde93cc3d8"
			
 
				+        }
			
 
				+      },
			
 
				+      "kind": "mineru",
			
 
				+      "used_by": [
			
 
				+        "preprocessor/mineru",
			
 
				+        "ocr_recognition/mineru",
			
 
				+        "table_classification/paddle"
			
 
				+      ]
			
 
				+    }
			
 
				+  }
			
 
				+}
			
--- a/ocr_tools/ocr_batch/processor_configs.yaml
+++ b/ocr_tools/ocr_batch/processor_configs.yaml
@@ -110,6 +110,22 @@ processors:
 
				     venv: "conda activate mineru"
			
 
				     description: "YUSYS(local) Wired UNET OCR PaddleOCR-VL"
			
 
				 
			
 
				+  yusys_mineruocr_local:
			
 
				+    script: "/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/main_v2.py"
			
 
				+    input_arg: "--input"
			
 
				+    output_arg: "--output_dir"
			
 
				+    scene_arg: "--scene"
			
 
				+    extra_args:
			
 
				+      - "--config=/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/config/bank_statement_mineru_vl_local.yaml"
			
 
				+      - "--pages=1-35"
			
 
				+      - "--streaming"
			
 
				+      - "--debug"
			
 
				+      - "--log_level=DEBUG"
			
 
				+    output_subdir: "bank_statement_yusys_mineruocr_local"
			
 
				+    log_subdir: "logs/bank_statement_yusys_mineruocr_local"
			
 
				+    venv: "conda activate mineru"
			
 
				+    description: "YUSYS(local) Wired UNET OCR MinerU-VL"
			
 
				+
			
 
				   # -------------------------------------------------------------------------
			
 
				   # PaddleOCR-VL 处理器
			
 
				   # -------------------------------------------------------------------------
			
--- a/ocr_tools/universal_doc_parser/config/bank_statement_mineru_vl_local.yaml
+++ b/ocr_tools/universal_doc_parser/config/bank_statement_mineru_vl_local.yaml
@@ -0,0 +1,264 @@
 
				+# 银行交易流水场景配置（增强版）
			
 
				+scene_name: "bank_statement_mineru_vl_local"
			
 
				+description: "银行交易流水、对账单等场景"
			
 
				+
			
 
				+input:
			
 
				+  supported_formats: [".pdf", ".png", ".jpg", ".jpeg", ".bmp", ".tiff"]
			
 
				+  dpi: 200  # PDF转图片的DPI
			
 
				+  txt_pdf_watermark_removal:
			
 
				+    enabled: true   # 文字型PDF渲染前去除水印XObject（保留文字可搜索性）
			
 
				+    sample_pages: 3  # 扫描前N页快速预检
			
 
				+
			
 
				+preprocessor:
			
 
				+  module: "mineru"
			
 
				+  # 页级预处理顺序：orient_first=先扶正再去水印（银行斜纹水印推荐）；watermark_first=兼容旧行为
			
 
				+  order: orient_first
			
 
				+  orientation_classifier:
			
 
				+    enabled: true
			
 
				+    model_name: "paddle_orientation_classification"
			
 
				+    model_dir: null  # 使用默认路径
			
 
				+  unwarping:
			
 
				+    enabled: false
			
 
				+  # 页级水印（细参见 ocr_utils/watermark/presets.py PAGE_WATERMARK_PRESETS）
			
 
				+  watermark_removal:
			
 
				+    enabled: false
			
 
				+    detect_before_remove: true
			
 
				+    method: threshold   # threshold | masked | masked_adaptive
			
 
				+    threshold: 175
			
 
				+    contrast_enhancement:
			
 
				+      enabled: false
			
 
				+      method: text_restore
			
 
				+      text_black_target: 85
			
 
				+    debug_options:
			
 
				+      enabled: false
			
 
				+      output_dir: null
			
 
				+      prefix: ""
			
 
				+      subdir: watermark_removal
			
 
				+      save_compare: true
			
 
				+      image_format: "png"
			
 
				+
			
 
				+# ============================================================
			
 
				+# Layout 检测配置 - 智能路由器（按场景直接选择模型）
			
 
				+# ============================================================
			
 
				+layout_detection:
			
 
				+  module: "smart_router"
			
 
				+  strategy: "scene"  # 按场景直接选择模型，不走ocr_eval
			
 
				+
			
 
				+  # 场景策略：指定场景直接选用的布局模型
			
 
				+  scene_strategy:
			
 
				+    bank_statement:
			
 
				+      model: "docling"
			
 
				+    financial_report:
			
 
				+      model: "paddle_ppdoclayoutv3"
			
 
				+  default_model: "docling"
			
 
				+
			
 
				+  # 配置多个模型
			
 
				+  models:
			
 
				+    docling:
			
 
				+      module: "docling"
			
 
				+      model_name: "docling-layout-old"
			
 
				+      model_dir: "ds4sd/docling-layout-old"
			
 
				+      device: "cpu"
			
 
				+      conf: 0.3
			
 
				+      num_threads: 4
			
 
				+
			
 
				+    paddle_ppdoclayoutv3:
			
 
				+      module: "paddle"
			
 
				+      model_name: "PP-DocLayoutV3"
			
 
				+      model_dir: "PaddlePaddle/PP-DocLayoutV3_safetensors"
			
 
				+      device: "cpu"
			
 
				+      conf: 0.3
			
 
				+      num_threads: 4
			
 
				+      batch_size: 1
			
 
				+  
			
 
				+  # 后处理配置
			
 
				+  post_process:
			
 
				+    # 将大面积文本块转换为表格（后处理）
			
 
				+    convert_large_text_to_table: true  # 是否启用
			
 
				+    min_text_area_ratio: 0.25         # 最小面积占比（25%）
			
 
				+    min_text_width_ratio: 0.4         # 最小宽度占比（40%）
			
 
				+    min_text_height_ratio: 0.3        # 最小高度占比（30%）
			
 
				+
			
 
				+  # 印章补充检测：使用 PP-DocLayoutV3 补充 docling 无法识别的密封区域
			
 
				+  seal_supplement:
			
 
				+    enabled: true                # 启用 seal 补充检测
			
 
				+    replace_existing: false      # false=增量合并; true=完全替换主结果中已有 seal
			
 
				+    replace_overlapping_image: true   # seal 与 image_body/image 等高 IoU 时替换为 seal（非丢弃）
			
 
				+    replace_iou_threshold: 0.7        # 触发替换的最小 IoU
			
 
				+    duplicate_iou_threshold: 0.3      # 未替换时，与任意框 IoU 超此值视为重复 seal
			
 
				+    # seal_detector 使用的模型配置，默认复用 paddle_ppdoclayoutv3 的配置
			
 
				+    model_config:
			
 
				+      module: "paddle"
			
 
				+      model_name: "PP-DocLayoutV3"
			
 
				+      model_dir: "PaddlePaddle/PP-DocLayoutV3_safetensors"
			
 
				+      device: "cpu"
			
 
				+      conf: 0.3
			
 
				+      num_threads: 4
			
 
				+
			
 
				+  # Debug 可视化（底图为 inference_image，与 Layout 检测输入一致）
			
 
				+  debug_options:
			
 
				+    enabled: false              # 由命令行 --debug / --debug-layout 控制
			
 
				+    output_dir: null            # null 时由 pipeline 按页注入
			
 
				+    prefix: ""
			
 
				+    subdir: layout_detection    # 输出至 debug/layout_detection/
			
 
				+    save_raw: true              # 后处理前
			
 
				+    save_post_processed: true   # 后处理后
			
 
				+    save_json: true
			
 
				+    image_format: "png"
			
 
				+
			
 
				+# ============================================================
			
 
				+# OCR 识别配置
			
 
				+# ============================================================
			
 
				+ocr_recognition:
			
 
				+  module: "mineru"
			
 
				+  language: "ch"
			
 
				+  det_threshold: 0.5
			
 
				+  unclip_ratio: 1.5
			
 
				+  enable_merge_det_boxes: false
			
 
				+  batch_size: 8
			
 
				+  device: "cpu"
			
 
				+
			
 
				+  # Debug 可视化（底图为 inference_image，与整页 OCR 输入一致）
			
 
				+  debug_options:
			
 
				+    enabled: false              # 由命令行 --debug / --debug-ocr 控制
			
 
				+    output_dir: null
			
 
				+    prefix: ""
			
 
				+    subdir: ocr_recognition     # 输出至 debug/ocr_recognition/
			
 
				+    save_json: true
			
 
				+    image_format: png
			
 
				+
			
 
				+# ============================================================
			
 
				+# 表格分类配置（自动区分有线/无线表格）
			
 
				+# ============================================================
			
 
				+table_classification:
			
 
				+  enabled: true               # 启用自动表格分类
			
 
				+  module: "paddle"            # 分类模型：paddle（MinerU PaddleTableClsModel）
			
 
				+  confidence_threshold: 0.5   # 分类置信度阈值
			
 
				+  batch_size: 16              # 批处理大小
			
 
				+
			
 
				+  # Debug 可视化配置
			
 
				+  debug_options:
			
 
				+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
			
 
				+    output_dir: null            # null 时由 pipeline 按页注入
			
 
				+    prefix: ""
			
 
				+    subdir: table_classification  # 输出至 debug/table_classification/
			
 
				+    save_table_lines: true      # paddle 线条检测叠加图
			
 
				+    image_format: "png"
			
 
				+
			
 
				+# ============================================================
			
 
				+# 有线表格识别专用配置（MinerU UNet）
			
 
				+# ============================================================
			
 
				+table_recognition_wired:
			
 
				+  use_wired_unet: false
			
 
				+  upscale_ratio: 3.333
			
 
				+  need_ocr: true
			
 
				+  row_threshold: 10
			
 
				+  col_threshold: 15
			
 
				+  ocr_conf_threshold: 0.9       # 单元格 OCR 置信度阈值
			
 
				+  cell_crop_margin: 2
			
 
				+  use_custom_postprocess: true  # 是否使用自定义后处理（默认启用）
			
 
				+
			
 
				+  # 是否启用倾斜矫正
			
 
				+  enable_deskew: true
			
 
				+
			
 
				+  # 🆕 启用多源单元格融合
			
 
				+  use_cell_fusion: true
			
 
				+  
			
 
				+  # 融合引擎配置
			
 
				+  cell_fusion:
			
 
				+    # RT-DETR 模型路径（必需）
			
 
				+    rtdetr_model_path: "/Users/zhch158/models/pytorch_models/Table/RT-DETR-L_wired_table_cell_det.onnx"
			
 
				+    
			
 
				+    # 融合权重
			
 
				+    unet_weight: 0.6        # UNet 权重（结构性强）
			
 
				+    rtdetr_weight: 0.4      # RT-DETR 权重（鲁棒性强）
			
 
				+    
			
 
				+    # 阈值配置
			
 
				+    iou_merge_threshold: 0.7    # 高IoU合并阈值（>0.7则加权平均）
			
 
				+    iou_nms_threshold: 0.5      # NMS去重阈值
			
 
				+    rtdetr_conf_threshold: 0.5  # RT-DETR置信度阈值
			
 
				+    
			
 
				+    # 功能开关
			
 
				+    enable_ocr_compensation: true      # 启用OCR边缘补偿
			
 
				+
			
 
				+  # 单元格二次 OCR（参数对齐 cell_sweep lab：threshold_t150_cl_1.0_8_ob_u128 / Pass2 tile=4）
			
 
				+  second_pass_ocr:
			
 
				+    reocr_mode: bank_statement
			
 
				+    line_min_score: 0.8
			
 
				+    cell_preprocess:
			
 
				+      watermark:
			
 
				+        enabled: true
			
 
				+        method: threshold
			
 
				+        threshold: 150
			
 
				+      contrast:                      # Pass1：去水印后 CLAHE
			
 
				+        enabled: true
			
 
				+        method: clahe
			
 
				+        clip_limit: 1.0
			
 
				+        tile_grid_size: 8
			
 
				+      upscale_min_side: 96          # Pass1：常规二次 OCR 放大最短边
			
 
				+      enhance_retry:                   # Pass2：低分/难例再试（可单独配置 upscale + contrast）
			
 
				+        enabled: true
			
 
				+        upscale_min_side: 128         # Pass2 放大最短边；未配置时沿用 Pass1
			
 
				+        contrast:
			
 
				+          enabled: true
			
 
				+          method: clahe
			
 
				+          clip_limit: 1.0
			
 
				+          tile_grid_size: 4
			
 
				+
			
 
				+  # Debug 可视化配置
			
 
				+  debug_options:
			
 
				+    enabled: false              # 由命令行 --debug / --debug-table 统一控制
			
 
				+    output_dir: null            # null 时由 pipeline 按页注入
			
 
				+    prefix: ""
			
 
				+    subdir: table_recognition_wired  # 输出至 debug/table_recognition_wired/
			
 
				+    save_table_lines: true
			
 
				+    save_connected_components: true
			
 
				+    save_grid_structure: true
			
 
				+    save_text_overlay: true
			
 
				+    image_format: "png"
			
 
				+    # 单元格二次 OCR 裁剪图：debug/table_recognition_wired/tablecell_ocr/
			
 
				+
			
 
				+# ============================================================
			
 
				+# VL识别配置 - 使用 PaddleOcr-VL（无线表格 + seal识别）
			
 
				+# ============================================================
			
 
				+vl_recognition:
			
 
				+  module: "mineru"
			
 
				+  backend: "http-client"
			
 
				+  model_name: "MinerU2.5-Pro-2604-1.2B"  # 与 mineru_local_daemon.sh 中 MODEL_NAME 一致
			
 
				+  server_url: "http://localhost:8103"
			
 
				+  max_image_size: 4096  # 🔧 添加：最大图片尺寸
			
 
				+  resize_mode: 'max'    # 🔧 添加：缩放模式 ('max' 保持宽高比, 'fixed' 固定尺寸)
			
 
				+  device: "cpu"
			
 
				+  batch_size: 1
			
 
				+  model_params:
			
 
				+    max_concurrency: 10
			
 
				+    http_timeout: 600
			
 
				+  
			
 
				+  # Task prompt mapping - 针对不同任务使用不同提示词
			
 
				+  task_prompt_mapping:
			
 
				+    text: "Text Recognition:"
			
 
				+    table: "Table Recognition:"
			
 
				+    formula: "Formula Recognition:"
			
 
				+    seal: "Seal Recognition:"  # 印章识别的专用提示词
			
 
				+  
			
 
				+  # 场景特定配置
			
 
				+  table_recognition:
			
 
				+
			
 
				+# ============================================================
			
 
				+# 输出配置
			
 
				+# ============================================================
			
 
				+output:
			
 
				+  create_subdir: false
			
 
				+  save_pdf_images: true
			
 
				+  save_json: true
			
 
				+  save_page_json: true
			
 
				+  save_markdown: true
			
 
				+  save_page_markdown: true
			
 
				+  save_html: true
			
 
				+  save_layout_image: true
			
 
				+  save_ocr_image: true
			
 
				+  draw_type_label: true
			
 
				+  draw_bbox_number: true
			
 
				+  save_enhanced_json: true
			
 
				+  normalize_numbers: true
			
 
				+  debug_mode: false
			
--- a/ocr_tools/universal_doc_parser/config/bank_statement_paddle_vl_local.yaml
+++ b/ocr_tools/universal_doc_parser/config/bank_statement_paddle_vl_local.yaml
@@ -82,6 +82,22 @@ layout_detection:
 
				     min_text_width_ratio: 0.4         # 最小宽度占比（40%）
			
 
				     min_text_height_ratio: 0.3        # 最小高度占比（30%）
			
 
				 
			
 
				+  # 印章补充检测：使用 PP-DocLayoutV3 补充 docling 无法识别的密封区域
			
 
				+  seal_supplement:
			
 
				+    enabled: true                # 启用 seal 补充检测
			
 
				+    replace_existing: false      # false=增量合并; true=完全替换主结果中已有 seal
			
 
				+    replace_overlapping_image: true   # seal 与 image_body/image 等高 IoU 时替换为 seal（非丢弃）
			
 
				+    replace_iou_threshold: 0.7        # 触发替换的最小 IoU
			
 
				+    duplicate_iou_threshold: 0.3      # 未替换时，与任意框 IoU 超此值视为重复 seal
			
 
				+    # seal_detector 使用的模型配置，默认复用 paddle_ppdoclayoutv3 的配置
			
 
				+    model_config:
			
 
				+      module: "paddle"
			
 
				+      model_name: "PP-DocLayoutV3"
			
 
				+      model_dir: "PaddlePaddle/PP-DocLayoutV3_safetensors"
			
 
				+      device: "cpu"
			
 
				+      conf: 0.3
			
 
				+      num_threads: 4
			
 
				+
			
 
				   # Debug 可视化（底图为 inference_image，与 Layout 检测输入一致）
			
 
				   debug_options:
			
 
				     enabled: false              # 由命令行 --debug / --debug-layout 控制
			
@@ -105,7 +121,6 @@ ocr_recognition:
 
				   batch_size: 8
			
 
				   device: "cpu"
			
 
				 
			
 
				-
			
 
				   # Debug 可视化（底图为 inference_image，与整页 OCR 输入一致）
			
 
				   debug_options:
			
 
				     enabled: false              # 由命令行 --debug / --debug-ocr 控制
			
@@ -137,7 +152,7 @@ table_classification:
 
				 # 有线表格识别专用配置（MinerU UNet）
			
 
				 # ============================================================
			
 
				 table_recognition_wired:
			
 
				-  use_wired_unet: true
			
 
				+  use_wired_unet: false
			
 
				   upscale_ratio: 3.333
			
 
				   need_ocr: true
			
 
				   row_threshold: 10
			
@@ -212,7 +227,7 @@ table_recognition_wired:
 
				 vl_recognition:
			
 
				   module: "paddle"
			
 
				   backend: "http-client"
			
 
				-  model_name: "PaddleOCR-VL-1.5"  # 与 paddle_local_daemon.sh 中 MODEL_NAME 一致
			
 
				+  model_name: "PaddleOCR-VL-1.6"  # 与 paddle_local_daemon.sh 中 MODEL_NAME 一致
			
 
				   server_url: "http://localhost:8102"
			
 
				   max_image_size: 4096  # 🔧 添加：最大图片尺寸
			
 
				   resize_mode: 'max'    # 🔧 添加：缩放模式 ('max' 保持宽高比, 'fixed' 固定尺寸)
			
--- a/ocr_tools/universal_doc_parser/main_v2.py
+++ b/ocr_tools/universal_doc_parser/main_v2.py
@@ -654,10 +654,15 @@ if __name__ == "__main__":
 
				             # "config": "./config/bank_statement_paddle_vl_local.yaml",
			
 
				             # "log_file": "./output/logs/bank_statement_paddle_vl_local/process.log",
			
 
				 
			
 
				+            # "input": "/Users/zhch158/workspace/data/流水分析/陈3_微信图.pdf",
			
 
				+            # "output_dir": "/Users/zhch158/workspace/data/流水分析/陈3_微信图/bank_statement_yusys_local",
			
 
				+            # "config": "./config/bank_statement_yusys_local.yaml",
			
 
				+            # "log_file": "./output/logs/陈3_微信图/bank_statement_yusys_local/process.log",
			
 
				+
			
 
				             "input": "/Users/zhch158/workspace/data/流水分析/陈3_微信图.pdf",
			
 
				-            "output_dir": "/Users/zhch158/workspace/data/流水分析/陈3_微信图/bank_statement_yusys_local",
			
 
				-            "config": "./config/bank_statement_yusys_local.yaml",
			
 
				-            "log_file": "./output/logs/陈3_微信图/bank_statement_yusys_local/process.log",
			
 
				+            "output_dir": "/Users/zhch158/workspace/data/流水分析/陈3_微信图/bank_statement_mineru_vl_local",
			
 
				+            "config": "./config/bank_statement_mineru_vl_local.yaml",
			
 
				+            "log_file": "./output/logs/陈3_微信图/bank_statement_mineru_vl_local/process.log",
			
 
				 
			
 
				             # "input": "/Users/zhch158/workspace/data/流水分析/彭_广东兴宁农村商业银行.pdf",
			
 
				             # "output_dir": "/Users/zhch158/workspace/data/流水分析/彭_广东兴宁农村商业银行/bank_statement_yusys_local",
			
--- a/ocr_tools/universal_doc_parser/models/adapters/_mineru_vl_patches.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/_mineru_vl_patches.py
@@ -0,0 +1,92 @@
 
				+"""mineru_vl_utils 运行时补丁集合。
			
 
				+
			
 
				+集中存放对第三方库 ``mineru_vl_utils`` 的运行时修补（monkey-patch），
			
 
				+目的是在**不修改第三方源码**的前提下修复其在 PaddleOCR-VL 模型上的兼容性问题，
			
 
				+并保证补丁随本仓库一起版本化、可随时开关、升级第三方库后不会丢失。
			
 
				+
			
 
				+当前包含的补丁：
			
 
				+
			
 
				+1. ``patch_convert_otsl_to_html``
			
 
				+   修复 PaddleOCR-VL 输出的 OTSL「整表首个单元格缺少前导 ``<fcel>`` token」
			
 
				+   导致 ``otsl_parse_texts`` 文本错位、所有单元格文字丢失的问题。
			
 
				+
			
 
				+统一通过 :func:`apply_once` 应用，幂等且仅在首次调用时生效。
			
 
				+"""
			
 
				+
			
 
				+from __future__ import annotations
			
 
				+
			
 
				+from loguru import logger
			
 
				+
			
 
				+# OTSL 结构 token；与 mineru_vl_utils.post_process 内部定义保持一致
			
 
				+_OTSL_STRUCT_TOKENS = ("<nl>", "<fcel>", "<ecel>", "<lcel>", "<ucel>", "<xcel>")
			
 
				+
			
 
				+_applied = False
			
 
				+
			
 
				+
			
 
				+def _make_otsl_normalizer(orig_convert):
			
 
				+    """生成一个在调用原始 convert_otsl_to_html 前先归一化 OTSL 的包装函数。"""
			
 
				+
			
 
				+    def _normalize_then_convert(otsl_content):
			
 
				+        if isinstance(otsl_content, str):
			
 
				+            stripped = otsl_content.lstrip()
			
 
				+            # 整表首格缺少前导结构 token（如 PaddleOCR-VL）时补 <fcel>，
			
 
				+            # 否则 otsl_parse_texts 的 text_idx 会永久错位，导致全部单元格文字丢失。
			
 
				+            if (
			
 
				+                stripped
			
 
				+                and not stripped.startswith("<table")
			
 
				+                and not stripped.startswith(_OTSL_STRUCT_TOKENS)
			
 
				+            ):
			
 
				+                otsl_content = "<fcel>" + stripped
			
 
				+        return orig_convert(otsl_content)
			
 
				+
			
 
				+    # 记录原始函数，便于排查与还原
			
 
				+    _normalize_then_convert.__wrapped__ = orig_convert
			
 
				+    return _normalize_then_convert
			
 
				+
			
 
				+
			
 
				+def _patch_convert_otsl_to_html():
			
 
				+    """替换 post_process 命名空间中的 convert_otsl_to_html。
			
 
				+
			
 
				+    mineru_vl_utils.post_process.__init__ 通过
			
 
				+    ``from .otsl2html import convert_otsl_to_html`` 导入该函数，
			
 
				+    其内部 simple_process / _convert_pure_table_content_to_html 在调用时
			
 
				+    按 post_process 模块全局名查找，因此覆盖该命名空间即可拦截全部内部调用。
			
 
				+    """
			
 
				+    import mineru_vl_utils.post_process as pp
			
 
				+    from mineru_vl_utils.post_process import otsl2html
			
 
				+
			
 
				+    orig = getattr(otsl2html, "convert_otsl_to_html", None)
			
 
				+    if orig is None:
			
 
				+        # 上游接口变更时大声报错，避免补丁静默失效后又开始丢字
			
 
				+        raise RuntimeError(
			
 
				+            "mineru_vl_utils 接口已变更：找不到 otsl2html.convert_otsl_to_html，"
			
 
				+            "请检查第三方库版本并更新补丁。"
			
 
				+        )
			
 
				+
			
 
				+    wrapped = _make_otsl_normalizer(orig)
			
 
				+    # 关键：post_process 内部调用按此命名空间查找
			
 
				+    pp.convert_otsl_to_html = wrapped
			
 
				+    # 兜底：若有代码直接 import otsl2html.convert_otsl_to_html
			
 
				+    otsl2html.convert_otsl_to_html = wrapped
			
 
				+
			
 
				+
			
 
				+def apply_once() -> bool:
			
 
				+    """应用全部 mineru_vl_utils 运行时补丁，幂等。
			
 
				+
			
 
				+    应在任何 ``content_extract`` / ``batch_content_extract`` 调用之前执行一次
			
 
				+    （通常放在 VL 识别器/检测器的 ``initialize()`` 内、获取模型之前）。
			
 
				+
			
 
				+    Returns:
			
 
				+        bool: 本次调用是否真正应用了补丁（首次为 True，后续为 False）。
			
 
				+    """
			
 
				+    global _applied
			
 
				+    if _applied:
			
 
				+        return False
			
 
				+    try:
			
 
				+        _patch_convert_otsl_to_html()
			
 
				+        _applied = True
			
 
				+        logger.info("已应用 mineru_vl_utils 补丁：OTSL 整表首格 <fcel> 归一化")
			
 
				+        return True
			
 
				+    except Exception as e:  # 补丁失败不应阻断主流程，但需明确告警
			
 
				+        logger.error(f"应用 mineru_vl_utils 补丁失败：{e}")
			
 
				+        raise
			
--- a/ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
+++ b/ocr_tools/universal_doc_parser/models/adapters/mineru_adapter.py
@@ -361,6 +361,17 @@ class MinerUVLRecognizer(BaseVLRecognizer):
 
				         # 🔧 添加图片尺寸限制配置
			
 
				         self.max_image_size = config.get('max_image_size', 1568)  # VLM 模型的最大尺寸
			
 
				         self.resize_mode = config.get('resize_mode', 'max')  # 'max' or 'fixed'
			
 
				+
			
 
				+        # 应用 mineru_vl_utils 运行时补丁（修复 PaddleOCR-VL OTSL 首格 <fcel> 缺失导致表格文字丢失）
			
 
				+        # 放在 __init__ 中，可同时覆盖 mineru 与 paddle 两条路径：
			
 
				+        # PaddleVLRecognizer 重写了 initialize() 但其 __init__ 会经 super().__init__ 到达这里。
			
 
				+        # 补丁仅替换 mineru_vl_utils.post_process 内函数，无需模型已加载，且幂等。
			
 
				+        try:
			
 
				+            from ._mineru_vl_patches import apply_once as _apply_mineru_vl_patches
			
 
				+            _apply_mineru_vl_patches()
			
 
				+        except Exception as e:
			
 
				+            # 补丁失败不应阻断识别器创建，退回默认行为，但需明确告警
			
 
				+            logger.warning(f"应用 mineru_vl_utils 补丁失败（退回默认行为，表格可能丢字）: {e}")
			
 
				         
			
 
				     def initialize(self):
			
 
				         """初始化VL模型"""
			
--- a/ocr_utils/module_debug_viz.py
+++ b/ocr_utils/module_debug_viz.py
@@ -19,6 +19,17 @@ from PIL import Image
 
				 MODULE_DEBUG_ROOT = "debug"
			
 
				 
			
 
				 
			
 
				+def _json_default(o: Any):
			
 
				+    """json.dumps 的兜底序列化：处理 numpy 标量/数组（如 OCR confidence 的 float32）。"""
			
 
				+    if isinstance(o, np.generic):
			
 
				+        return o.item()
			
 
				+    if isinstance(o, np.ndarray):
			
 
				+        return o.tolist()
			
 
				+    if isinstance(o, (set, tuple)):
			
 
				+        return list(o)
			
 
				+    raise TypeError(f"Object of type {o.__class__.__name__} is not JSON serializable")
			
 
				+
			
 
				+
			
 
				 def resolve_module_debug_dir(
			
 
				     output_dir: Union[str, Path],
			
 
				     subdir: str,
			
@@ -286,7 +297,7 @@ def save_layout_debug(
 
				             }
			
 
				             json_path = debug_dir / f'{page_name}_layout_{suffix}.json'
			
 
				             json_path.write_text(
			
 
				-                json.dumps(json_data, ensure_ascii=False, indent=2),
			
 
				+                json.dumps(json_data, ensure_ascii=False, indent=2, default=_json_default),
			
 
				                 encoding='utf-8',
			
 
				             )
			
 
				             paths['json'] = str(json_path)
			
@@ -334,7 +345,7 @@ def save_ocr_debug(
 
				             }
			
 
				             json_path = debug_dir / f'{page_name}_ocr_spans.json'
			
 
				             json_path.write_text(
			
 
				-                json.dumps(json_data, ensure_ascii=False, indent=2),
			
 
				+                json.dumps(json_data, ensure_ascii=False, indent=2, default=_json_default),
			
 
				                 encoding='utf-8',
			
 
				             )
			
 
				             paths['json'] = str(json_path)
			
--- a/ocr_validator/config/彭_广东兴宁农村商业银行.yaml
+++ b/ocr_validator/config/彭_广东兴宁农村商业银行.yaml
@@ -18,3 +18,17 @@ document:
 
				       image_dir: "bank_statement_yusys_glmocr_local/{{name}}"
			
 
				       description: "YUSYS-OCR框架(local) GLM-OCR VLM"
			
 
				       enabled: true
			
 
				+
			
 
				+    # bank_statement_paddleocr_local
			
 
				+    - tool: "mineru"
			
 
				+      result_dir: "bank_statement_yusys_paddleocr_local"
			
 
				+      image_dir: "bank_statement_yusys_paddleocr_local/{{name}}"
			
 
				+      description: "YUSYS-OCR框架(local) PaddleOCR VLM"
			
 
				+      enabled: true
			
 
				+
			
 
				+    # bank_statement_mineruocr_local
			
 
				+    - tool: "mineru"
			
 
				+      result_dir: "bank_statement_yusys_mineruocr_local"
			
 
				+      image_dir: "bank_statement_yusys_mineruocr_local/{{name}}"
			
 
				+      description: "YUSYS-OCR框架(local) MinerU-VL"
			
 
				+      enabled: true
			
--- a/ocr_validator/config/钟_广东陆丰农村商业银行.yaml
+++ b/ocr_validator/config/钟_广东陆丰农村商业银行.yaml
@@ -18,3 +18,17 @@ document:
 
				       image_dir: "bank_statement_yusys_glmocr_local/{{name}}"
			
 
				       description: "YUSYS-OCR框架(local) GLM-OCR VLM"
			
 
				       enabled: true
			
 
				+
			
 
				+    # bank_statement_paddleocr_local
			
 
				+    - tool: "mineru"
			
 
				+      result_dir: "bank_statement_yusys_paddleocr_local"
			
 
				+      image_dir: "bank_statement_yusys_paddleocr_local/{{name}}"
			
 
				+      description: "YUSYS-OCR框架(local) PaddleOCR VLM"
			
 
				+      enabled: true
			
 
				+
			
 
				+    # bank_statement_mineruocr_local
			
 
				+    - tool: "mineru"
			
 
				+      result_dir: "bank_statement_yusys_mineruocr_local"
			
 
				+      image_dir: "bank_statement_yusys_mineruocr_local/{{name}}"
			
 
				+      description: "YUSYS-OCR框架(local) MinerU-VL"
			
 
				+      enabled: true
			
--- a/ocr_validator/config/陈3_微信图.yaml
+++ b/ocr_validator/config/陈3_微信图.yaml
@@ -18,3 +18,17 @@ document:
 
				       image_dir: "bank_statement_yusys_glmocr_local/{{name}}"
			
 
				       description: "YUSYS-OCR框架(local) GLM-OCR VLM"
			
 
				       enabled: true
			
 
				+
			
 
				+    # bank_statement_paddleocr_local
			
 
				+    - tool: "mineru"
			
 
				+      result_dir: "bank_statement_yusys_paddleocr_local"
			
 
				+      image_dir: "bank_statement_yusys_paddleocr_local/{{name}}"
			
 
				+      description: "YUSYS-OCR框架(local) PaddleOCR VLM"
			
 
				+      enabled: true
			
 
				+
			
 
				+    # bank_statement_mineruocr_local
			
 
				+    - tool: "mineru"
			
 
				+      result_dir: "bank_statement_yusys_mineruocr_local"
			
 
				+      image_dir: "bank_statement_yusys_mineruocr_local/{{name}}"
			
 
				+      description: "YUSYS-OCR框架(local) MinerU-VL"
			
 
				+      enabled: true
نویسنده	SHA1 پیام	تاریخ
zhch158_admin	9413ec2600 feat(更新MinerU本地OCR配置): 修改main_v2.py中的输出目录、配置文件和日志文件路径，以支持新的MinerU处理方式，提升文档解析的灵活性与准确性。	1 ماه پیش
zhch158_admin	da4189fde7 feat(新增MinerU本地OCR配置): 在多个配置文件中新增对MinerU的支持，添加相应的工具、结果目录和描述信息，提升OCR框架的灵活性与可用性。	1 ماه پیش
zhch158_admin	e4978b5cce feat(新增JSON序列化支持): 在module_debug_viz.py中新增_json_default函数，增强json.dumps的序列化能力，支持numpy标量/数组、集合和元组的序列化，提升调试信息的可读性与兼容性。	1 ماه پیش
zhch158_admin	0cb48eed12 feat(新增银行交易流水场景配置): 新增bank_statement_mineru_vl_local.yaml配置文件，支持银行交易流水和对账单的文档解析，包含输入输出参数、预处理、布局检测、OCR识别及表格分类等功能，提升文档解析的灵活性与准确性。	1 ماه پیش
zhch158_admin	b599507513 feat(新增YUSYS本地OCR配置): 在processor_configs.yaml中新增yusys_mineruocr_local配置，支持本地文档解析，包含输入输出参数、额外参数及日志目录设置，提升OCR处理的灵活性与可用性。	1 ماه پیش
zhch158_admin	c816ff91ca feat(新增模型变更巡检工具): 新增model_doctor工具，支持对模型清单进行指纹采集与基线比对，提供模型变更、缺失、服务不可达等状态报告，提升模型管理的可视化与监控能力。同时新增手动维护的模型清单和指纹基线文件，完善文档说明与使用示例。	1 ماه پیش
zhch158_admin	3764003f18 feat(删除PaddleOCR-VL 1.6到GGUF转换文档): 删除paddleocr_vl 1.6到GGUF的转换方案文档，清理不再需要的文件，保持文档结构的整洁性。	1 ماه پیش
zhch158_admin	327ef352f5 feat(更新本地守护进程脚本): 修改mineru_local_daemon.sh脚本，更新模型路径和文件名，添加llama-server可执行文件路径检查，优化启动和配置提示信息，提升本地服务的可用性与用户体验。	1 ماه پیش
zhch158_admin	556b67d19f feat(新增GGUF转换方案文档): 新增HF_safetensors->GGUF.md文档，详细说明PaddleOCR-VL 1.6到GGUF的转换步骤与注意事项，提供两条高效路径，解决OTSL结构token过滤问题，提升用户在模型转换过程中的指导性与可操作性。	1 ماه پیش
zhch158_admin	4af3067a19 feat(更新PaddleOCR配置): 修改main_v2.py中的PaddleOCR配置，调整输出目录、配置文件和日志文件路径，以支持新的处理方式，提升文档解析的灵活性与准确性。	1 ماه پیش
zhch158_admin	213a1ca9f2 feat(新增PaddleOCR本地配置): 在多个配置文件中新增对PaddleOCR的支持，添加相应的工具、结果目录和描述信息，提升OCR框架的灵活性与可用性。	1 ماه پیش
zhch158_admin	4e44a6c829 feat(新增mineru_vl_utils运行时补丁): 新增对PaddleOCR-VL的OTSL转换补丁，修复表格首格缺失前导结构token的问题，确保输出HTML中完整保留文本，提升文档解析的准确性与可靠性。同时在MinerUVLRecognizer初始化中应用该补丁，确保兼容性。	1 ماه پیش
zhch158_admin	beb41fe75e feat(新增印章补充检测功能): 在bank_statement_paddle_vl_local.yaml中新增印章补充检测配置，优化密封区域识别能力，提升文档解析的准确性与完整性。同时更新表格识别配置，调整PaddleOCR-VL模型版本至1.6，增强整体OCR性能。	1 ماه پیش
zhch158_admin	9e171404ce feat(新增PaddleOCR-VL表格文字丢失问题补丁): 新增运行时补丁模块，修复PaddleOCR-VL在OTSL转换过程中表格首格文字丢失的问题，确保输出HTML中保留完整文本，提升文档解析的准确性与可靠性。	1 ماه پیش
zhch158_admin	2257f5093d feat(新增PaddleOCR-VL 1.6到GGUF转换文档): 新增paddleocr_vl 1.6到GGUF的详细转换方案，提供两条高效路径，包含社区资源和自转步骤，提升用户在模型转换过程中的指导性与可操作性。	1 ماه پیش
zhch158_admin	497c6aa2de feat(新增PaddleOCR-VL本地服务脚本): 新增paddle_local_daemon_1.6.sh脚本，支持在macOS上启动PaddleOCR-VL本地llama-server服务，配置模型路径、参数及日志管理，提升本地OCR服务的可用性与易用性。	1 ماه پیش