瀏覽代碼

Add local daemon scripts for GLM-OCR and PaddleOCR-VL with image processing capabilities

- Introduced `glmocr_local_daemon.sh` for managing GLM-OCR llama-server on macOS, including start, stop, status, and logging functionalities.
- Added `paddle_local_daemon.sh` for managing PaddleOCR-VL llama-server with similar functionalities.
- Created `curl_local_ocr.sh` for sending OCR requests to the local server.
- Included `curl_local_img.png` as a sample image for testing.
- Added `payload.json` for structured API requests to the OCR models.
- Enhanced logging and configuration checks in both daemon scripts.
zhch158_admin 2 周之前
父節點
當前提交
5cdf258ef7

+ 210 - 2
ocr_tools/daemons/README.md

@@ -8,6 +8,8 @@
 
 ## 脚本列表
 
+### 远程 vLLM 服务(Linux/GPU)
+
 | 脚本文件 | 服务类型 | 默认端口 | 服务 URL |
 |---------|---------|---------|---------|
 | `mineru_vllm_daemon.sh` | MinerU vLLM | 8121 | http://localhost:8121 |
@@ -15,6 +17,13 @@
 | `paddle_vllm_daemon.sh` | PaddleOCR-VL vLLM | 8110 | http://localhost:8110 |
 | `dotsocr_vllm_daemon.sh` | DotsOCR vLLM | 8101 | http://localhost:8101 |
 
+### 本地 GGUF 模型服务(macOS/Metal)
+
+| 脚本文件 | 服务类型 | 默认端口 | 服务 URL |
+|---------|---------|---------|---------|
+| `glmocr_local_daemon.sh` | GLM-OCR Q8_0 (llama.cpp) | 8080 | http://localhost:8080 |
+| `paddleocr_local_daemon.sh` | PaddleOCR-VL-1.5 (llama.cpp) | 8081 | http://localhost:8081 |
+
 ## 脚本与客户端工具映射
 
 | 服务端脚本 | 客户端工具 | 服务类型 | 默认端口 | API 端点 |
@@ -210,14 +219,162 @@ cd ../dots.ocr_vl_tool
 python main.py --input document.pdf --output_dir ./output --ip localhost --port 8101
 ```
 
+### 5. glmocr_local_daemon.sh
+
+**服务类型**:GLM-OCR 本地 GGUF 模型服务(macOS/Metal)
+
+**配置参数**:
+- `CONDA_ENV`: conda 环境名称(默认: `mineru2`)
+- `PORT`: 服务端口(默认: `8080`)
+- `HOST`: 服务主机(默认: `0.0.0.0`)
+- `MODEL_PATH`: GGUF 模型路径(默认: `~/Library/Caches/llama.cpp/ggml-org_GLM-OCR-GGUF_GLM-OCR-Q8_0.gguf`)
+- `MMPROJ_PATH`: 多模态投影器路径(默认: `~/Library/Caches/llama.cpp/ggml-org_GLM-OCR-GGUF_mmproj-GLM-OCR-Q8_0.gguf`)
+- `CONTEXT_SIZE`: 上下文长度(默认: `16384`)
+- `GPU_LAYERS`: Metal GPU 层数(默认: `99`,全部)
+- `THREADS`: CPU 线程数(默认: `8`)
+
+**启动方法**:
+```bash
+./glmocr_local_daemon.sh start
+```
+
+**服务 URL**:
+- API 端点: `http://localhost:8080`
+- OpenAI 兼容 API: `http://localhost:8080/v1/chat/completions`
+- Models 端点: `http://localhost:8080/v1/models`
+
+**依赖环境**:
+- macOS (M4 Pro 推荐)
+- Homebrew 安装 llama.cpp: `brew install llama.cpp`
+- conda 环境: `mineru2`
+- 模型文件位于: `~/Library/Caches/llama.cpp/`
+
+**模型大小**:
+- 主模型: 950MB (GLM-OCR-Q8_0.gguf)
+- 多模态投影器: 484MB (mmproj-GLM-OCR-Q8_0.gguf)
+
+**客户端使用**:
+```bash
+# 使用配置文件调用服务
+cd ../universal_doc_parser
+python parse.py --input document.pdf \
+  --config config/bank_statement_yusys_local.yaml --debug
+
+# 或直接调用 API
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "glm-ocr",
+    "messages": [{
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "Table Recognition:"},
+        {"type": "image_url", "image_url": {"url": "file://test.png"}}
+      ]
+    }],
+    "max_tokens": 16384
+  }'
+```
+
+**特点**:
+- 🚀 本地运行,无需网络访问
+- 🍎 Metal GPU 加速(macOS)
+- 📊 OpenAI 兼容 API
+- 🎯 确定性输出(--temp 0)
+- 💾 低内存占用(GGUF Q8_0 量化)
+
+### 6. paddleocr_local_daemon.sh
+
+**服务类型**:PaddleOCR-VL-1.5 本地 GGUF 模型服务(macOS/Metal)
+
+**配置参数**:
+- `CONDA_ENV`: conda 环境名称(默认: `mineru2`)
+- `PORT`: 服务端口(默认: `8081`)
+- `HOST`: 服务主机(默认: `0.0.0.0`)
+- `MODEL_PATH`: GGUF 模型路径(默认: `~/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5.gguf`)
+- `MMPROJ_PATH`: 多模态投影器路径(默认: `~/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5-mmproj.gguf`)
+- `CONTEXT_SIZE`: 上下文长度(默认: `16384`)
+- `GPU_LAYERS`: Metal GPU 层数(默认: `99`,全部)
+- `THREADS`: CPU 线程数(默认: `8`)
+
+**启动方法**:
+```bash
+./paddleocr_local_daemon.sh start
+```
+
+**服务 URL**:
+- API 端点: `http://localhost:8081`
+- OpenAI 兼容 API: `http://localhost:8081/v1/chat/completions`
+- Models 端点: `http://localhost:8081/v1/models`
+
+**依赖环境**:
+- macOS (M4 Pro 推荐)
+- Homebrew 安装 llama.cpp: `brew install llama.cpp`
+- conda 环境: `mineru2`
+- 模型文件位于: `~/Library/Caches/llama.cpp/`
+
+**模型大小**:
+- 主模型: 936MB (PaddleOCR-VL-1.5.gguf)
+- 多模态投影器: 882MB (PaddleOCR-VL-1.5-mmproj.gguf)
+
+**客户端使用**:
+```bash
+# 使用配置文件调用服务
+cd ../universal_doc_parser
+python parse.py --input document.pdf \
+  --config config/bank_statement_paddleocr_local.yaml --debug
+
+# 或直接调用 API
+curl -X POST http://localhost:8081/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "paddleocr-vl",
+    "messages": [{
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "Table Recognition:"},
+        {"type": "image_url", "image_url": {"url": "file://test.png"}}
+      ]
+    }],
+    "max_tokens": 16384
+  }'
+```
+
+**特点**:
+- 🚀 本地运行,无需网络访问
+- 🍎 Metal GPU 加速(macOS)
+- 📊 OpenAI 兼容 API
+- 🎯 确定性输出(--temp 0)
+- 💾 低内存占用(GGUF 量化)
+
+**对比测试**:
+```bash
+# 可同时启动两个服务进行对比测试
+./glmocr_local_daemon.sh start      # 端口 8080
+./paddleocr_local_daemon.sh start   # 端口 8081
+
+# 检查状态
+./glmocr_local_daemon.sh status
+./paddleocr_local_daemon.sh status
+```
+
 ## 部署建议
 
 ### 1. 环境准备
 
+#### Linux/GPU 环境(vLLM 服务)
 - 确保所有依赖的 conda 环境已正确安装
 - 确保模型文件已下载并放置在正确位置
 - 确保 GPU 驱动和 CUDA 已正确安装
 
+#### macOS/Metal 环境(本地 GGUF 服务)
+- 安装 llama.cpp: `brew install llama.cpp`
+- 确保 conda 环境 `mineru2` 已创建
+- 模型文件自动下载到 `~/Library/Caches/llama.cpp/` 或手动下载:
+  - GLM-OCR: https://huggingface.co/ggml-org/GLM-OCR-GGUF
+  - PaddleOCR-VL: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF
+- 推荐硬件: Mac M4 Pro 或更高(至少 16GB RAM)
+
 ### 2. 配置调整
 
 在部署前,请根据实际环境调整脚本中的配置参数:
@@ -258,6 +415,7 @@ WantedBy=multi-user.target
 
 ### 5. 日志管理
 
+#### vLLM 服务日志(Linux)
 所有服务的日志文件位于 `/home/ubuntu/zhch/logs/` 目录:
 
 - `mineru_vllm.log` - MinerU vLLM 服务日志
@@ -265,20 +423,34 @@ WantedBy=multi-user.target
 - `paddleocr_vl_vllm.log` - PaddleOCR-VL vLLM 服务日志
 - `vllm.log` - DotsOCR vLLM 服务日志
 
+#### 本地 GGUF 服务日志(macOS)
+日志文件位于 `~/workspace/logs/` 目录:
+
+- `glmocr_llamaserver.log` - GLM-OCR llama-server 日志
+- `glmocr_llamaserver.pid` - GLM-OCR 进程 PID
+- `paddleocr_llamaserver.log` - PaddleOCR-VL llama-server 日志
+- `paddleocr_llamaserver.pid` - PaddleOCR-VL 进程 PID
+
 建议定期清理或轮转日志文件。
 
 ## 故障排查
 
 ### 问题:服务启动失败
 
-**可能原因**:
+**可能原因(vLLM 服务)**:
 1. conda 环境未正确激活
 2. 依赖包未安装
 3. 模型文件不存在
 4. 端口已被占用
 5. GPU 不可用或配置错误
 
-**解决方法**:
+**可能原因(本地 GGUF 服务)**:
+1. llama-server 未安装或版本不兼容
+2. 模型文件不存在于 `~/Library/Caches/llama.cpp/`
+3. 端口已被占用
+4. conda 环境未激活
+
+**解决方法(vLLM)**:
 1. 使用 `./script_name.sh config` 检查配置
 2. 检查 conda 环境是否正确激活:`conda env list`
 3. 检查依赖是否安装:`which python`, `which mineru-vllm-server` 等
@@ -286,6 +458,26 @@ WantedBy=multi-user.target
 5. 检查端口占用:`netstat -tuln | grep :PORT`
 6. 检查 GPU 状态:`nvidia-smi`
 
+**解决方法(本地 GGUF)**:
+1. 检查 llama-server: `which llama-server`, `llama-server --version`
+2. 检查模型文件: `ls -lh ~/Library/Caches/llama.cpp/`
+3. 检查端口占用: `lsof -i :8080` 或 `lsof -i :8081`
+4. 查看日志: `./glmocr_local_daemon.sh logs`
+5. 手动下载模型(如果缓存目录为空)
+
+### 问题:llama-server 文件访问错误
+
+**可能原因**:
+1. 使用绝对路径而非相对路径
+2. --media-path 设置不正确
+3. 文件路径包含中文或特殊字符
+
+**解决方法**:
+1. 确保 --media-path 设置为基准目录(如 `/Users/zhch158/workspace`)
+2. 图片路径使用相对路径:`file://test.png` 而非 `file:///Users/...`
+3. 避免路径中包含中文字符
+4. 测试: `./glmocr_local_daemon.sh test`
+
 ### 问题:API 无响应
 
 **可能原因**:
@@ -333,9 +525,25 @@ WantedBy=multi-user.target
 
 ## 注意事项
 
+### vLLM 服务(Linux/GPU)
 1. **路径配置**:脚本中的路径(如模型路径、日志路径)需要根据实际部署环境调整
 2. **端口冲突**:确保不同服务使用不同的端口,避免冲突
 3. **GPU 资源**:合理分配 GPU 资源,避免多个服务竞争同一 GPU
 4. **日志管理**:定期清理日志文件,避免磁盘空间不足
 5. **服务监控**:建议使用监控工具(如 systemd、supervisor)管理服务,确保服务稳定运行
 
+### 本地 GGUF 服务(macOS/Metal)
+1. **端口分离**:GLM-OCR 使用 8080,PaddleOCR-VL 使用 8081,可同时运行
+2. **文件路径**:使用相对路径(相对于 `--media-path`),而非绝对路径
+3. **内存要求**:每个服务约占用 2-3GB 内存,确保足够 RAM
+4. **Metal 加速**:自动使用 Metal GPU,无需 CUDA 配置
+5. **模型下载**:首次运行可能需要下载模型文件(约 1.5-2GB)
+6. **确定性输出**:使用 `--temp 0` 确保 OCR 结果一致性
+7. **API 兼容性**:完全兼容 OpenAI vision API 格式
+
+### 模型选择建议
+- **GLM-OCR**: 适合通用 OCR 场景,特别是英文和图表识别
+- **PaddleOCR-VL**: 适合中文 OCR 场景,表格识别效果好
+- **对比测试**: 可同时运行两个服务,对比识别效果后选择
+- **资源限制**: 如果内存有限(<16GB),建议只运行一个服务
+

+ 1 - 0
ocr_tools/daemons/curl_local_img.png

@@ -0,0 +1 @@
+/Users/zhch158/workspace/data/流水分析/A用户_单元格扫描流水/bank_statement_yusys_v4/A用户_单元格扫描流水/A用户_单元格扫描流水_page_003.png

+ 18 - 0
ocr_tools/daemons/curl_local_ocr.sh

@@ -0,0 +1,18 @@
+# --media-path /Users/zhch158/workspace  # 设置基础目录
+# {"url": "file://test_ocr_image.png"}  # 相对于基础目录的相对路径
+curl http://127.0.0.1:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "ocr-vl",
+    "messages": [
+      {
+        "role": "user",
+        "content": [
+          {"type": "text", "text": "Table Recognition:"},
+          {"type": "image_url", "image_url": {"url": "file://repository.git/ocr_platform/ocr_tools/daemons/curl_local_img.png"}}
+        ]
+      }
+    ],
+    "max_tokens": 16384,
+    "temperature": 0.1
+  }'

+ 381 - 0
ocr_tools/daemons/glmocr_local_daemon.sh

@@ -0,0 +1,381 @@
+#!/bin/bash
+# filepath: ocr_platform/ocr_tools/daemons/glmocr_local_daemon.sh
+# 对应: GLM-OCR 本地 llama-server 服务(macOS),使用 GGUF 格式模型
+# 适用于 Mac M4 Pro 48G,使用 Metal GPU 加速
+# 模型下载地址: https://huggingface.co/ggml-org/GLM-OCR-GGUF
+# 模型下载地址: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF
+
+# curl -X POST http://localhost:8080/v1/chat/completions -d @payload.json
+
+LOGDIR="$HOME/workspace/logs"
+mkdir -p $LOGDIR
+PIDFILE="$LOGDIR/glmocr_llamaserver.pid"
+LOGFILE="$LOGDIR/glmocr_llamaserver.log"
+
+# 配置参数
+CONDA_ENV="mineru2"
+PORT="8080"
+HOST="0.0.0.0"
+
+# 本地 GGUF 模型路径
+MODEL_PATH="$HOME/Library/Caches/llama.cpp/ggml-org_GLM-OCR-GGUF_GLM-OCR-Q8_0.gguf"
+MMPROJ_PATH="$HOME/Library/Caches/llama.cpp/ggml-org_GLM-OCR-GGUF_mmproj-GLM-OCR-Q8_0.gguf"
+
+# llama-server 参数
+CONTEXT_SIZE="16384"         # 上下文长度(需 >= max_tokens,推荐 8192-16384)
+GPU_LAYERS="99"              # Metal GPU 层数(99 表示全部)
+THREADS="8"                  # CPU 线程数(M4 Pro 建议值)
+BATCH_SIZE="512"             # 批处理大小
+UBATCH_SIZE="128"            # 微批处理大小
+
+# conda 环境激活
+if [ -f "$HOME/anaconda3/etc/profile.d/conda.sh" ]; then
+    source "$HOME/anaconda3/etc/profile.d/conda.sh"
+    conda activate $CONDA_ENV
+elif [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then
+    source "$HOME/miniconda3/etc/profile.d/conda.sh"
+    conda activate $CONDA_ENV
+elif [ -f "/opt/miniconda3/etc/profile.d/conda.sh" ]; then
+    source /opt/miniconda3/etc/profile.d/conda.sh
+    conda activate $CONDA_ENV
+else
+    echo "Warning: conda initialization file not found, trying direct path"
+    export PATH="/opt/miniconda3/envs/$CONDA_ENV/bin:$PATH"
+fi
+
+start() {
+    if [ -f $PIDFILE ] && kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        echo "GLM-OCR llama-server 已在运行"
+        return 1
+    fi
+
+    echo "启动 GLM-OCR llama-server 守护进程..."
+    echo "Host: $HOST, Port: $PORT"
+    echo "主模型: $MODEL_PATH"
+    echo "多模态投影器: $MMPROJ_PATH"
+    echo "上下文长度: $CONTEXT_SIZE"
+    echo "GPU 层数: $GPU_LAYERS (Metal)"
+    echo "线程数: $THREADS"
+
+    # 检查模型文件是否存在
+    if [ ! -f "$MODEL_PATH" ]; then
+        echo "❌ 主模型文件不存在: $MODEL_PATH"
+        echo "请确认模型已下载到 llama.cpp 缓存目录"
+        return 1
+    fi
+
+    if [ ! -f "$MMPROJ_PATH" ]; then
+        echo "❌ 多模态投影器文件不存在: $MMPROJ_PATH"
+        echo "请确认 mmproj 文件已下载"
+        return 1
+    fi
+
+    # 检查 llama-server 命令
+    if ! command -v llama-server >/dev/null 2>&1; then
+        echo "❌ llama-server 未找到"
+        echo "请安装: brew install llama.cpp"
+        return 1
+    fi
+
+    echo "🔧 使用 llama-server: $(which llama-server)"
+    echo "🔧 llama.cpp 版本: $(llama-server --version 2>&1 | head -1 || echo 'Unknown')"
+
+    echo "💻 系统信息:"
+    echo "  架构: $(uname -m)"
+    echo "  系统: $(uname -s)"
+    echo "  内存: $(sysctl -n hw.memsize | awk '{printf "%.1f GB", $1/1024/1024/1024}')"
+
+    # 启动 llama-server
+    # --log-disable \
+    nohup llama-server \
+        -m "$MODEL_PATH" \
+        --mmproj "$MMPROJ_PATH" \
+        --host $HOST \
+        --port $PORT \
+        --media-path /Users/zhch158/workspace \
+        -c $CONTEXT_SIZE \
+        -ngl $GPU_LAYERS \
+        -t $THREADS \
+        -b $BATCH_SIZE \
+        -ub $UBATCH_SIZE \
+        --temp 0 \
+        > $LOGFILE 2>&1 &
+
+    echo $! > $PIDFILE
+    echo "✅ GLM-OCR llama-server 已启动,PID: $(cat $PIDFILE)"
+    echo "📋 日志文件: $LOGFILE"
+    echo "🌐 服务 URL: http://$HOST:$PORT"
+    echo "📖 OpenAI 兼容 API: http://localhost:$PORT/v1 (chat/completions, models)"
+    echo ""
+    echo "等待服务启动..."
+    sleep 5
+    status
+}
+
+stop() {
+    if [ ! -f $PIDFILE ]; then
+        echo "GLM-OCR llama-server 未在运行"
+        return 1
+    fi
+
+    PID=$(cat $PIDFILE)
+    echo "停止 GLM-OCR llama-server (PID: $PID)..."
+
+    kill $PID
+
+    for i in {1..30}; do
+        if ! kill -0 $PID 2>/dev/null; then
+            break
+        fi
+        echo "等待进程停止... ($i/30)"
+        sleep 1
+    done
+
+    if kill -0 $PID 2>/dev/null; then
+        echo "强制终止进程..."
+        kill -9 $PID
+    fi
+
+    rm -f $PIDFILE
+    echo "✅ GLM-OCR llama-server 已停止"
+}
+
+status() {
+    if [ -f $PIDFILE ] && kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        PID=$(cat $PIDFILE)
+        echo "✅ GLM-OCR llama-server 正在运行 (PID: $PID)"
+        echo "🌐 服务 URL: http://$HOST:$PORT"
+        echo "📋 日志文件: $LOGFILE"
+
+        # 检查端口监听状态
+        if lsof -nP -iTCP:$PORT -sTCP:LISTEN >/dev/null 2>&1; then
+            echo "🔗 端口 $PORT 正在监听"
+        else
+            echo "⚠️  端口 $PORT 未在监听(服务可能正在启动)"
+        fi
+
+        # 检查 API 响应
+        if command -v curl >/dev/null 2>&1; then
+            if curl -s --connect-timeout 2 http://127.0.0.1:$PORT/v1/models > /dev/null 2>&1; then
+                echo "🎯 API 响应正常"
+            else
+                echo "⚠️  API 无响应(服务可能正在启动)"
+            fi
+        fi
+
+        # 显示进程内存使用
+        if command -v ps >/dev/null 2>&1; then
+            MEM=$(ps -o rss= -p $PID 2>/dev/null | awk '{printf "%.2f GB", $1/1024/1024}')
+            if [ -n "$MEM" ]; then
+                echo "💾 内存使用: $MEM"
+            fi
+        fi
+
+        if [ -f $LOGFILE ]; then
+            echo "📄 最近日志(最后 3 行):"
+            tail -3 $LOGFILE | sed 's/^/  /'
+        fi
+    else
+        echo "❌ GLM-OCR llama-server 未在运行"
+        if [ -f $PIDFILE ]; then
+            echo "删除过期的 PID 文件..."
+            rm -f $PIDFILE
+        fi
+    fi
+}
+
+logs() {
+    if [ -f $LOGFILE ]; then
+        echo "📄 GLM-OCR llama-server 日志:"
+        echo "====================="
+        tail -f $LOGFILE
+    else
+        echo "❌ 日志文件不存在: $LOGFILE"
+    fi
+}
+
+config() {
+    echo "📋 当前配置:"
+    echo "  Conda 环境: $CONDA_ENV"
+    echo "  Host: $HOST"
+    echo "  Port: $PORT"
+    echo "  主模型路径: $MODEL_PATH"
+    echo "  多模态投影器: $MMPROJ_PATH"
+    echo "  上下文长度: $CONTEXT_SIZE"
+    echo "  GPU 层数: $GPU_LAYERS"
+    echo "  线程数: $THREADS"
+    echo "  批处理大小: $BATCH_SIZE"
+    echo "  微批处理大小: $UBATCH_SIZE"
+    echo "  PID 文件: $PIDFILE"
+    echo "  日志文件: $LOGFILE"
+
+    echo ""
+    echo "📦 模型文件检查:"
+    if [ -f "$MODEL_PATH" ]; then
+        SIZE=$(du -h "$MODEL_PATH" | cut -f1)
+        echo "  ✅ 主模型存在 ($SIZE)"
+    else
+        echo "  ❌ 主模型不存在"
+    fi
+
+    if [ -f "$MMPROJ_PATH" ]; then
+        SIZE=$(du -h "$MMPROJ_PATH" | cut -f1)
+        echo "  ✅ 多模态投影器存在 ($SIZE)"
+    else
+        echo "  ❌ 多模态投影器不存在"
+    fi
+
+    echo ""
+    echo "🔧 环境检查:"
+    echo "  llama-server: $(which llama-server 2>/dev/null || echo '未安装')"
+    if command -v llama-server >/dev/null 2>&1; then
+        LLAMA_VERSION=$(llama-server --version 2>&1 | head -1 || echo 'Unknown')
+        echo "  版本: $LLAMA_VERSION"
+    fi
+    echo "  Conda: $(which conda 2>/dev/null || echo '未找到')"
+    echo "  当前 Python: $(which python 2>/dev/null || echo '未找到')"
+
+    echo ""
+    echo "💻 系统信息:"
+    echo "  架构: $(uname -m)"
+    echo "  系统版本: $(sw_vers -productVersion 2>/dev/null || echo 'Unknown')"
+    echo "  总内存: $(sysctl -n hw.memsize 2>/dev/null | awk '{printf "%.1f GB", $1/1024/1024/1024}' || echo 'Unknown')"
+    echo "  CPU 核心: $(sysctl -n hw.ncpu 2>/dev/null || echo 'Unknown')"
+}
+
+test_api() {
+    echo "🧪 测试 GLM-OCR llama-server API..."
+
+    if [ ! -f $PIDFILE ] || ! kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        echo "❌ GLM-OCR llama-server 服务未在运行"
+        return 1
+    fi
+
+    if ! command -v curl >/dev/null 2>&1; then
+        echo "❌ curl 命令未找到"
+        return 1
+    fi
+
+    echo "📡 测试 /v1/models 端点..."
+    response=$(curl -s --connect-timeout 10 http://127.0.0.1:$PORT/v1/models)
+    if [ $? -eq 0 ]; then
+        echo "✅ Models 端点可访问"
+        echo "$response" | python -m json.tool 2>/dev/null || echo "$response"
+    else
+        echo "❌ Models 端点不可访问"
+    fi
+
+    echo ""
+    echo "📡 测试 /health 端点..."
+    health=$(curl -s --connect-timeout 5 http://127.0.0.1:$PORT/health)
+    if [ $? -eq 0 ]; then
+        echo "✅ Health 端点: $health"
+    else
+        echo "⚠️  Health 端点不可访问"
+    fi
+}
+
+test_client() {
+    echo "🧪 测试 GLM-OCR 与 llama-server 集成..."
+
+    if [ ! -f $PIDFILE ] || ! kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        echo "❌ GLM-OCR llama-server 服务未在运行,请先启动: $0 start"
+        return 1
+    fi
+
+    CONFIG_FILE="/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/config/bank_statement_yusys_local.yaml"
+    
+    echo "📄 配置文件: $CONFIG_FILE"
+    echo ""
+    echo "确保配置文件中 vl_recognition.api_url 指向: http://localhost:$PORT/v1/chat/completions"
+    echo ""
+    echo "测试命令示例:"
+    echo "  cd /Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser"
+    echo "  conda activate mineru2"
+    echo "  python parse.py --input /path/to/test/image.png --config $CONFIG_FILE --debug"
+    echo ""
+    echo "或者使用 curl 直接测试 API:"
+    echo "  curl -X POST http://localhost:$PORT/v1/chat/completions \\"
+    echo "    -H 'Content-Type: application/json' \\"
+    echo "    -d '{"
+    echo "      \"model\": \"glm-ocr\","
+    echo "      \"messages\": ["
+    echo "        {"
+    echo "          \"role\": \"user\","
+    echo "          \"content\": ["
+    echo "            {\"type\": \"text\", \"text\": \"Table Recognition:\"},"
+    echo "            {\"type\": \"image_url\", \"image_url\": {\"url\": \"file:///path/to/image.png\"}}"
+    echo "          ]"
+    echo "        }"
+    echo "      ],"
+    echo "      \"max_tokens\": 4096"
+    echo "    }'"
+}
+
+usage() {
+    echo "GLM-OCR llama-server 服务守护进程(macOS)"
+    echo "==========================================="
+    echo "用法: $0 {start|stop|restart|status|logs|config|test|test-client}"
+    echo ""
+    echo "命令:"
+    echo "  start       - 启动 GLM-OCR llama-server 服务"
+    echo "  stop        - 停止 GLM-OCR llama-server 服务"
+    echo "  restart     - 重启 GLM-OCR llama-server 服务"
+    echo "  status      - 显示服务状态和资源使用"
+    echo "  logs        - 显示服务日志(跟踪模式)"
+    echo "  config      - 显示当前配置"
+    echo "  test        - 测试 /v1/models API 端点"
+    echo "  test-client - 显示如何测试与配置文件集成"
+    echo ""
+    echo "配置(编辑脚本修改):"
+    echo "  Host: $HOST"
+    echo "  Port: $PORT"
+    echo "  主模型: $MODEL_PATH"
+    echo "  多模态投影器: $MMPROJ_PATH"
+    echo "  上下文长度: $CONTEXT_SIZE"
+    echo "  GPU 层数: $GPU_LAYERS (Metal)"
+    echo ""
+    echo "示例:"
+    echo "  ./glmocr_local_daemon.sh start"
+    echo "  ./glmocr_local_daemon.sh status"
+    echo "  ./glmocr_local_daemon.sh logs"
+    echo "  ./glmocr_local_daemon.sh test"
+    echo ""
+    echo "前置要求:"
+    echo "  1. 安装 llama.cpp: brew install llama.cpp"
+    echo "  2. 模型文件位于: ~/Library/Caches/llama.cpp/"
+    echo "  3. conda 环境 mineru2 已配置"
+}
+
+case "$1" in
+    start)
+        start
+        ;;
+    stop)
+        stop
+        ;;
+    restart)
+        stop
+        sleep 3
+        start
+        ;;
+    status)
+        status
+        ;;
+    logs)
+        logs
+        ;;
+    config)
+        config
+        ;;
+    test)
+        test_api
+        ;;
+    test-client)
+        test_client
+        ;;
+    *)
+        usage
+        exit 1
+        ;;
+esac

+ 378 - 0
ocr_tools/daemons/paddle_local_daemon.sh

@@ -0,0 +1,378 @@
+#!/bin/bash
+# filepath: ocr_platform/ocr_tools/daemons/paddleocr_local_daemon.sh
+# 对应: PaddleOCR-VL 本地 llama-server 服务(macOS),使用 GGUF 格式模型
+# 适用于 Mac M4 Pro 48G,使用 Metal GPU 加速
+# 模型下载地址: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF
+# curl -X POST http://localhost:8081/v1/chat/completions -d @payload.json
+
+LOGDIR="$HOME/workspace/logs"
+mkdir -p $LOGDIR
+PIDFILE="$LOGDIR/paddleocr_llamaserver.pid"
+LOGFILE="$LOGDIR/paddleocr_llamaserver.log"
+
+# 配置参数
+CONDA_ENV="mineru2"
+PORT="8081"
+HOST="0.0.0.0"
+
+# 本地 GGUF 模型路径
+MODEL_PATH="$HOME/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5.gguf"
+MMPROJ_PATH="$HOME/Library/Caches/llama.cpp/PaddlePaddle_PaddleOCR-VL-1.5-GGUF_PaddleOCR-VL-1.5-mmproj.gguf"
+
+# llama-server 参数
+CONTEXT_SIZE="16384"         # 上下文长度(需 >= max_tokens,推荐 8192-16384)
+GPU_LAYERS="99"              # Metal GPU 层数(99 表示全部)
+THREADS="8"                  # CPU 线程数(M4 Pro 建议值)
+BATCH_SIZE="512"             # 批处理大小
+UBATCH_SIZE="128"            # 微批处理大小
+
+# conda 环境激活
+if [ -f "$HOME/anaconda3/etc/profile.d/conda.sh" ]; then
+    source "$HOME/anaconda3/etc/profile.d/conda.sh"
+    conda activate $CONDA_ENV
+elif [ -f "$HOME/miniconda3/etc/profile.d/conda.sh" ]; then
+    source "$HOME/miniconda3/etc/profile.d/conda.sh"
+    conda activate $CONDA_ENV
+elif [ -f "/opt/miniconda3/etc/profile.d/conda.sh" ]; then
+    source /opt/miniconda3/etc/profile.d/conda.sh
+    conda activate $CONDA_ENV
+else
+    echo "Warning: conda initialization file not found, trying direct path"
+    export PATH="/opt/miniconda3/envs/$CONDA_ENV/bin:$PATH"
+fi
+
+start() {
+    if [ -f $PIDFILE ] && kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        echo "PaddleOCR-VL llama-server 已在运行"
+        return 1
+    fi
+
+    echo "启动 PaddleOCR-VL llama-server 守护进程..."
+    echo "Host: $HOST, Port: $PORT"
+    echo "主模型: $MODEL_PATH"
+    echo "多模态投影器: $MMPROJ_PATH"
+    echo "上下文长度: $CONTEXT_SIZE"
+    echo "GPU 层数: $GPU_LAYERS (Metal)"
+    echo "线程数: $THREADS"
+
+    # 检查模型文件是否存在
+    if [ ! -f "$MODEL_PATH" ]; then
+        echo "❌ 主模型文件不存在: $MODEL_PATH"
+        echo "请确认模型已下载到 llama.cpp 缓存目录"
+        return 1
+    fi
+
+    if [ ! -f "$MMPROJ_PATH" ]; then
+        echo "❌ 多模态投影器文件不存在: $MMPROJ_PATH"
+        echo "请确认 mmproj 文件已下载"
+        return 1
+    fi
+
+    # 检查 llama-server 命令
+    if ! command -v llama-server >/dev/null 2>&1; then
+        echo "❌ llama-server 未找到"
+        echo "请安装: brew install llama.cpp"
+        return 1
+    fi
+
+    echo "🔧 使用 llama-server: $(which llama-server)"
+    echo "🔧 llama.cpp 版本: $(llama-server --version 2>&1 | head -1 || echo 'Unknown')"
+
+    echo "💻 系统信息:"
+    echo "  架构: $(uname -m)"
+    echo "  系统: $(uname -s)"
+    echo "  内存: $(sysctl -n hw.memsize | awk '{printf "%.1f GB", $1/1024/1024/1024}')"
+
+    # 启动 llama-server
+    nohup llama-server \
+        -m "$MODEL_PATH" \
+        --mmproj "$MMPROJ_PATH" \
+        --host $HOST \
+        --port $PORT \
+        --media-path /Users/zhch158/workspace \
+        -c $CONTEXT_SIZE \
+        -ngl $GPU_LAYERS \
+        -t $THREADS \
+        -b $BATCH_SIZE \
+        -ub $UBATCH_SIZE \
+        --temp 0 \
+        > $LOGFILE 2>&1 &
+
+    echo $! > $PIDFILE
+    echo "✅ PaddleOCR-VL llama-server 已启动,PID: $(cat $PIDFILE)"
+    echo "📋 日志文件: $LOGFILE"
+    echo "🌐 服务 URL: http://$HOST:$PORT"
+    echo "📖 OpenAI 兼容 API: http://localhost:$PORT/v1 (chat/completions, models)"
+    echo ""
+    echo "等待服务启动..."
+    sleep 5
+    status
+}
+
+stop() {
+    if [ ! -f $PIDFILE ]; then
+        echo "PaddleOCR-VL llama-server 未在运行"
+        return 1
+    fi
+
+    PID=$(cat $PIDFILE)
+    echo "停止 PaddleOCR-VL llama-server (PID: $PID)..."
+
+    kill $PID
+
+    for i in {1..30}; do
+        if ! kill -0 $PID 2>/dev/null; then
+            break
+        fi
+        echo "等待进程停止... ($i/30)"
+        sleep 1
+    done
+
+    if kill -0 $PID 2>/dev/null; then
+        echo "强制终止进程..."
+        kill -9 $PID
+    fi
+
+    rm -f $PIDFILE
+    echo "✅ PaddleOCR-VL llama-server 已停止"
+}
+
+status() {
+    if [ -f $PIDFILE ] && kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        PID=$(cat $PIDFILE)
+        echo "✅ PaddleOCR-VL llama-server 正在运行 (PID: $PID)"
+        echo "🌐 服务 URL: http://$HOST:$PORT"
+        echo "📋 日志文件: $LOGFILE"
+
+        # 检查端口监听状态
+        if lsof -nP -iTCP:$PORT -sTCP:LISTEN >/dev/null 2>&1; then
+            echo "🔗 端口 $PORT 正在监听"
+        else
+            echo "⚠️  端口 $PORT 未在监听(服务可能正在启动)"
+        fi
+
+        # 检查 API 响应
+        if command -v curl >/dev/null 2>&1; then
+            if curl -s --connect-timeout 2 http://127.0.0.1:$PORT/v1/models > /dev/null 2>&1; then
+                echo "🎯 API 响应正常"
+            else
+                echo "⚠️  API 无响应(服务可能正在启动)"
+            fi
+        fi
+
+        # 显示进程内存使用
+        if command -v ps >/dev/null 2>&1; then
+            MEM=$(ps -o rss= -p $PID 2>/dev/null | awk '{printf "%.2f GB", $1/1024/1024}')
+            if [ -n "$MEM" ]; then
+                echo "💾 内存使用: $MEM"
+            fi
+        fi
+
+        if [ -f $LOGFILE ]; then
+            echo "📄 最近日志(最后 3 行):"
+            tail -3 $LOGFILE | sed 's/^/  /'
+        fi
+    else
+        echo "❌ PaddleOCR-VL llama-server 未在运行"
+        if [ -f $PIDFILE ]; then
+            echo "删除过期的 PID 文件..."
+            rm -f $PIDFILE
+        fi
+    fi
+}
+
+logs() {
+    if [ -f $LOGFILE ]; then
+        echo "📄 PaddleOCR-VL llama-server 日志:"
+        echo "====================="
+        tail -f $LOGFILE
+    else
+        echo "❌ 日志文件不存在: $LOGFILE"
+    fi
+}
+
+config() {
+    echo "📋 当前配置:"
+    echo "  Conda 环境: $CONDA_ENV"
+    echo "  Host: $HOST"
+    echo "  Port: $PORT"
+    echo "  主模型路径: $MODEL_PATH"
+    echo "  多模态投影器: $MMPROJ_PATH"
+    echo "  上下文长度: $CONTEXT_SIZE"
+    echo "  GPU 层数: $GPU_LAYERS"
+    echo "  线程数: $THREADS"
+    echo "  批处理大小: $BATCH_SIZE"
+    echo "  微批处理大小: $UBATCH_SIZE"
+    echo "  PID 文件: $PIDFILE"
+    echo "  日志文件: $LOGFILE"
+
+    echo ""
+    echo "📦 模型文件检查:"
+    if [ -f "$MODEL_PATH" ]; then
+        SIZE=$(du -h "$MODEL_PATH" | cut -f1)
+        echo "  ✅ 主模型存在 ($SIZE)"
+    else
+        echo "  ❌ 主模型不存在"
+    fi
+
+    if [ -f "$MMPROJ_PATH" ]; then
+        SIZE=$(du -h "$MMPROJ_PATH" | cut -f1)
+        echo "  ✅ 多模态投影器存在 ($SIZE)"
+    else
+        echo "  ❌ 多模态投影器不存在"
+    fi
+
+    echo ""
+    echo "🔧 环境检查:"
+    echo "  llama-server: $(which llama-server 2>/dev/null || echo '未安装')"
+    if command -v llama-server >/dev/null 2>&1; then
+        LLAMA_VERSION=$(llama-server --version 2>&1 | head -1 || echo 'Unknown')
+        echo "  版本: $LLAMA_VERSION"
+    fi
+    echo "  Conda: $(which conda 2>/dev/null || echo '未找到')"
+    echo "  当前 Python: $(which python 2>/dev/null || echo '未找到')"
+
+    echo ""
+    echo "💻 系统信息:"
+    echo "  架构: $(uname -m)"
+    echo "  系统版本: $(sw_vers -productVersion 2>/dev/null || echo 'Unknown')"
+    echo "  总内存: $(sysctl -n hw.memsize 2>/dev/null | awk '{printf "%.1f GB", $1/1024/1024/1024}' || echo 'Unknown')"
+    echo "  CPU 核心: $(sysctl -n hw.ncpu 2>/dev/null || echo 'Unknown')"
+}
+
+test_api() {
+    echo "🧪 测试 PaddleOCR-VL llama-server API..."
+
+    if [ ! -f $PIDFILE ] || ! kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        echo "❌ PaddleOCR-VL llama-server 服务未在运行"
+        return 1
+    fi
+
+    if ! command -v curl >/dev/null 2>&1; then
+        echo "❌ curl 命令未找到"
+        return 1
+    fi
+
+    echo "📡 测试 /v1/models 端点..."
+    response=$(curl -s --connect-timeout 10 http://127.0.0.1:$PORT/v1/models)
+    if [ $? -eq 0 ]; then
+        echo "✅ Models 端点可访问"
+        echo "$response" | python -m json.tool 2>/dev/null || echo "$response"
+    else
+        echo "❌ Models 端点不可访问"
+    fi
+
+    echo ""
+    echo "📡 测试 /health 端点..."
+    health=$(curl -s --connect-timeout 5 http://127.0.0.1:$PORT/health)
+    if [ $? -eq 0 ]; then
+        echo "✅ Health 端点: $health"
+    else
+        echo "⚠️  Health 端点不可访问"
+    fi
+}
+
+test_client() {
+    echo "🧪 测试 PaddleOCR-VL 与 llama-server 集成..."
+
+    if [ ! -f $PIDFILE ] || ! kill -0 $(cat $PIDFILE) 2>/dev/null; then
+        echo "❌ PaddleOCR-VL llama-server 服务未在运行,请先启动: $0 start"
+        return 1
+    fi
+
+    CONFIG_FILE="/Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser/config/bank_statement_paddleocr_local.yaml"
+    
+    echo "📄 配置文件: $CONFIG_FILE"
+    echo ""
+    echo "确保配置文件中 vl_recognition.api_url 指向: http://localhost:$PORT/v1/chat/completions"
+    echo ""
+    echo "测试命令示例:"
+    echo "  cd /Users/zhch158/workspace/repository.git/ocr_platform/ocr_tools/universal_doc_parser"
+    echo "  conda activate mineru2"
+    echo "  python parse.py --input /path/to/test/image.png --config $CONFIG_FILE --debug"
+    echo ""
+    echo "或者使用 curl 直接测试 API:"
+    echo "  curl -X POST http://localhost:$PORT/v1/chat/completions \\"
+    echo "    -H 'Content-Type: application/json' \\"
+    echo "    -d '{"
+    echo "      \"model\": \"paddleocr-vl\","
+    echo "      \"messages\": ["
+    echo "        {"
+    echo "          \"role\": \"user\","
+    echo "          \"content\": ["
+    echo "            {\"type\": \"text\", \"text\": \"Table Recognition:\"},"
+    echo "            {\"type\": \"image_url\", \"image_url\": {\"url\": \"file:///path/to/image.png\"}}"
+    echo "          ]"
+    echo "        }"
+    echo "      ],"
+    echo "      \"max_tokens\": 4096"
+    echo "    }'"
+}
+
+usage() {
+    echo "PaddleOCR-VL llama-server 服务守护进程(macOS)"
+    echo "==========================================="
+    echo "用法: $0 {start|stop|restart|status|logs|config|test|test-client}"
+    echo ""
+    echo "命令:"
+    echo "  start       - 启动 PaddleOCR-VL llama-server 服务"
+    echo "  stop        - 停止 PaddleOCR-VL llama-server 服务"
+    echo "  restart     - 重启 PaddleOCR-VL llama-server 服务"
+    echo "  status      - 显示服务状态和资源使用"
+    echo "  logs        - 显示服务日志(跟踪模式)"
+    echo "  config      - 显示当前配置"
+    echo "  test        - 测试 /v1/models API 端点"
+    echo "  test-client - 显示如何测试与配置文件集成"
+    echo ""
+    echo "配置(编辑脚本修改):"
+    echo "  Host: $HOST"
+    echo "  Port: $PORT"
+    echo "  主模型: $MODEL_PATH"
+    echo "  多模态投影器: $MMPROJ_PATH"
+    echo "  上下文长度: $CONTEXT_SIZE"
+    echo "  GPU 层数: $GPU_LAYERS (Metal)"
+    echo ""
+    echo "示例:"
+    echo "  ./paddleocr_local_daemon.sh start"
+    echo "  ./paddleocr_local_daemon.sh status"
+    echo "  ./paddleocr_local_daemon.sh logs"
+    echo "  ./paddleocr_local_daemon.sh test"
+    echo ""
+    echo "前置要求:"
+    echo "  1. 安装 llama.cpp: brew install llama.cpp"
+    echo "  2. 模型文件位于: ~/Library/Caches/llama.cpp/"
+    echo "  3. conda 环境 mineru2 已配置"
+}
+
+case "$1" in
+    start)
+        start
+        ;;
+    stop)
+        stop
+        ;;
+    restart)
+        stop
+        sleep 3
+        start
+        ;;
+    status)
+        status
+        ;;
+    logs)
+        logs
+        ;;
+    config)
+        config
+        ;;
+    test)
+        test_api
+        ;;
+    test-client)
+        test_client
+        ;;
+    *)
+        usage
+        exit 1
+        ;;
+esac

+ 22 - 0
ocr_tools/daemons/payload.json

@@ -0,0 +1,22 @@
+{
+	"model": "ocr-vl",
+	"messages": [
+		{
+			"role": "user",
+			"content": [
+				{
+					"type": "text",
+					"text": "Table Recognition:"
+				},
+				{
+					"type": "image_url",
+					"image_url": {
+						"url": "file://repository.git/ocr_platform/ocr_tools/daemons/curl_local_img.png"
+					}
+				}
+			]
+		}
+	],
+	"max_tokens": 16384,
+	"temperature": 0.1
+}