|
|
@@ -0,0 +1,314 @@
|
|
|
+您的理解**完全正确**! 🎯 这确实是两个独立进程,钩子不会起作用。让我提供正确的解决方案:
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 问题分析
|
|
|
+
|
|
|
+```bash
|
|
|
+# ❌ 进程1: 预加载适配器
|
|
|
+python3 -c "import sys; sys.path.insert(0, 'zhch'); import paddlex_adapter_hook"
|
|
|
+
|
|
|
+# ❌ 进程2: 启动服务 (全新进程,不会继承进程1的状态)
|
|
|
+paddlex --serve --port 8111 --device "gpu:3" --pipeline ...
|
|
|
+```
|
|
|
+
|
|
|
+**问题**:
|
|
|
+- 进程1加载适配器后立即退出
|
|
|
+- 进程2是全新的Python进程,不会继承进程1的任何状态
|
|
|
+- 适配器未生效
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## ✅ 正确方案:使用 `sitecustomize.py` 自动加载
|
|
|
+
|
|
|
+Python会在启动时自动导入 `sitecustomize.py`,我们利用这个机制:
|
|
|
+
|
|
|
+### 方案1:全局 sitecustomize.py(推荐)
|
|
|
+
|
|
|
+创建 `zhch/sitecustomize.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+"""
|
|
|
+PaddleX 适配器自动加载钩子
|
|
|
+放置在 Python 搜索路径中,会在任何 Python 进程启动时自动执行
|
|
|
+"""
|
|
|
+import os
|
|
|
+import sys
|
|
|
+
|
|
|
+def load_paddlex_adapters():
|
|
|
+ """自动加载 PaddleX 适配器"""
|
|
|
+ # 只有设置了环境变量才激活
|
|
|
+ if os.getenv("PADDLEX_ENABLE_TABLE_ADAPTER", "").lower() not in ("true", "1", "yes"):
|
|
|
+ return
|
|
|
+
|
|
|
+ try:
|
|
|
+ # 确保适配器路径在 sys.path
|
|
|
+ adapter_base = os.path.dirname(os.path.abspath(__file__))
|
|
|
+ adapters_path = os.path.join(adapter_base, 'adapters')
|
|
|
+
|
|
|
+ if adapters_path not in sys.path:
|
|
|
+ sys.path.insert(0, adapters_path)
|
|
|
+
|
|
|
+ # 延迟导入,只有在实际使用时才加载
|
|
|
+ def activate_adapter():
|
|
|
+ try:
|
|
|
+ from adapters.table_recognition_adapter import apply_table_recognition_adapter
|
|
|
+ if apply_table_recognition_adapter():
|
|
|
+ print("✅ [sitecustomize] Table recognition adapter activated")
|
|
|
+ except Exception as e:
|
|
|
+ print(f"⚠️ [sitecustomize] Failed to activate adapter: {e}")
|
|
|
+
|
|
|
+ # 监听 PaddleX 导入事件
|
|
|
+ import importlib.abc
|
|
|
+ import importlib.machinery
|
|
|
+
|
|
|
+ class AdapterLoader(importlib.abc.MetaPathFinder):
|
|
|
+ def find_module(self, fullname, path=None):
|
|
|
+ if fullname == "paddlex.inference.pipelines.table_recognition.pipeline_v2":
|
|
|
+ # 在目标模块导入后激活适配器
|
|
|
+ activate_adapter()
|
|
|
+ return None
|
|
|
+
|
|
|
+ sys.meta_path.insert(0, AdapterLoader())
|
|
|
+
|
|
|
+ except Exception as e:
|
|
|
+ print(f"⚠️ [sitecustomize] Error setting up adapter hook: {e}")
|
|
|
+
|
|
|
+# 自动执行
|
|
|
+load_paddlex_adapters()
|
|
|
+```
|
|
|
+
|
|
|
+### 修改启动脚本 `start_table_recognition_service.sh`:
|
|
|
+
|
|
|
+```bash
|
|
|
+#!/bin/bash
|
|
|
+# filepath: zhch/start_table_recognition_service.sh
|
|
|
+
|
|
|
+# 🎯 设置适配器激活环境变量
|
|
|
+export PADDLEX_ENABLE_TABLE_ADAPTER="true"
|
|
|
+
|
|
|
+# 🎯 关键:将 zhch 目录加入 PYTHONPATH,使 sitecustomize.py 被自动发现
|
|
|
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
|
+export PYTHONPATH="$SCRIPT_DIR:$PYTHONPATH"
|
|
|
+
|
|
|
+# 设置模型下载源
|
|
|
+export PADDLE_PDX_MODEL_SOURCE="bos"
|
|
|
+
|
|
|
+# 设置日志级别
|
|
|
+export PYTHONWARNINGS="ignore::UserWarning"
|
|
|
+
|
|
|
+# 获取参数
|
|
|
+PORT=${1:-8111}
|
|
|
+DEVICE=${2:-"gpu:3"}
|
|
|
+PIPELINE_CONFIG=${3:-"zhch/my_config/table_recognition_v2.yaml"}
|
|
|
+
|
|
|
+echo "🚀 Starting Table Recognition Service with Enhanced Adapter..."
|
|
|
+echo " - Port: $PORT"
|
|
|
+echo " - Device: $DEVICE"
|
|
|
+echo " - Pipeline: $PIPELINE_CONFIG"
|
|
|
+echo " - Adapter: ENABLED (via sitecustomize.py)"
|
|
|
+echo " - PYTHONPATH: $PYTHONPATH"
|
|
|
+
|
|
|
+# 启动服务 (sitecustomize.py 会自动被加载)
|
|
|
+nohup paddlex --serve \
|
|
|
+ --port "$PORT" \
|
|
|
+ --device "$DEVICE" \
|
|
|
+ --pipeline "$PIPELINE_CONFIG" \
|
|
|
+ > "table_recognition_service_$(date +%Y%m%d_%H%M%S).log" 2>&1 &
|
|
|
+
|
|
|
+PID=$!
|
|
|
+echo "✅ Service started! PID: $PID"
|
|
|
+echo "📝 Logs: tail -f table_recognition_service_*.log"
|
|
|
+echo ""
|
|
|
+echo "🛑 To stop: kill $PID"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 方案2:包装脚本启动(备选)
|
|
|
+
|
|
|
+如果 `sitecustomize.py` 不生效,使用Python包装脚本:
|
|
|
+
|
|
|
+创建 `zhch/start_paddlex_with_adapter.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+#!/usr/bin/env python3
|
|
|
+"""
|
|
|
+PaddleX 服务启动包装器 - 自动激活适配器
|
|
|
+"""
|
|
|
+import os
|
|
|
+import sys
|
|
|
+import subprocess
|
|
|
+
|
|
|
+def main():
|
|
|
+ # 🎯 在当前进程中激活适配器
|
|
|
+ if os.getenv("PADDLEX_ENABLE_TABLE_ADAPTER", "").lower() in ("true", "1", "yes"):
|
|
|
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'adapters'))
|
|
|
+ try:
|
|
|
+ from table_recognition_adapter import apply_table_recognition_adapter
|
|
|
+ if apply_table_recognition_adapter():
|
|
|
+ print("✅ Adapter activated in wrapper process")
|
|
|
+ except Exception as e:
|
|
|
+ print(f"⚠️ Failed to activate adapter: {e}")
|
|
|
+
|
|
|
+ # 🎯 解析命令行参数
|
|
|
+ args = sys.argv[1:] # 去掉脚本名
|
|
|
+
|
|
|
+ # 🎯 导入 PaddleX CLI 主函数并在当前进程执行
|
|
|
+ from paddlex.paddlex_cli import main as paddlex_main
|
|
|
+
|
|
|
+ # 替换 sys.argv 以传递参数
|
|
|
+ sys.argv = ['paddlex'] + args
|
|
|
+
|
|
|
+ # 执行 PaddleX
|
|
|
+ paddlex_main()
|
|
|
+
|
|
|
+if __name__ == "__main__":
|
|
|
+ main()
|
|
|
+```
|
|
|
+
|
|
|
+修改启动脚本使用包装器:
|
|
|
+
|
|
|
+```bash
|
|
|
+#!/bin/bash
|
|
|
+# filepath: zhch/start_table_recognition_service.sh
|
|
|
+
|
|
|
+export PADDLEX_ENABLE_TABLE_ADAPTER="true"
|
|
|
+export PADDLE_PDX_MODEL_SOURCE="bos"
|
|
|
+export PYTHONWARNINGS="ignore::UserWarning"
|
|
|
+
|
|
|
+PORT=${1:-8111}
|
|
|
+DEVICE=${2:-"gpu:3"}
|
|
|
+PIPELINE_CONFIG=${3:-"zhch/my_config/table_recognition_v2.yaml"}
|
|
|
+
|
|
|
+echo "🚀 Starting Table Recognition Service with Adapter Wrapper..."
|
|
|
+
|
|
|
+# 🎯 使用包装脚本启动 (在同一进程中)
|
|
|
+nohup python3 zhch/start_paddlex_with_adapter.py \
|
|
|
+ --serve \
|
|
|
+ --port "$PORT" \
|
|
|
+ --device "$DEVICE" \
|
|
|
+ --pipeline "$PIPELINE_CONFIG" \
|
|
|
+ > "table_recognition_service_$(date +%Y%m%d_%H%M%S).log" 2>&1 &
|
|
|
+
|
|
|
+PID=$!
|
|
|
+echo "✅ Service started! PID: $PID"
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 方案3:修改适配器为延迟激活(最简单)
|
|
|
+
|
|
|
+修改 [`zhch/adapters/table_recognition_adapter.py`](zhch/adapters/table_recognition_adapter.py ) 的自动激活逻辑:
|
|
|
+
|
|
|
+```python
|
|
|
+# 在文件末尾修改
|
|
|
+def _auto_activate_on_module_import():
|
|
|
+ """
|
|
|
+ 当 pipeline_v2 模块被导入时自动激活
|
|
|
+ """
|
|
|
+ if os.getenv("PADDLEX_ENABLE_TABLE_ADAPTER", "").lower() not in ("true", "1", "yes"):
|
|
|
+ return
|
|
|
+
|
|
|
+ import sys
|
|
|
+ from importlib import import_module
|
|
|
+
|
|
|
+ # 保存原始 __import__
|
|
|
+ original_import = __builtins__.__import__
|
|
|
+
|
|
|
+ def patched_import(name, *args, **kwargs):
|
|
|
+ module = original_import(name, *args, **kwargs)
|
|
|
+
|
|
|
+ # 检测目标模块导入
|
|
|
+ if name == "paddlex.inference.pipelines.table_recognition.pipeline_v2":
|
|
|
+ print("🎯 Detected pipeline_v2 import, activating adapter...")
|
|
|
+ apply_table_recognition_adapter()
|
|
|
+ # 恢复原始 __import__
|
|
|
+ __builtins__.__import__ = original_import
|
|
|
+
|
|
|
+ return module
|
|
|
+
|
|
|
+ # 替换 __import__
|
|
|
+ __builtins__.__import__ = patched_import
|
|
|
+
|
|
|
+# 模块导入时执行
|
|
|
+_auto_activate_on_module_import()
|
|
|
+```
|
|
|
+
|
|
|
+然后在启动脚本中:
|
|
|
+
|
|
|
+```bash
|
|
|
+#!/bin/bash
|
|
|
+export PADDLEX_ENABLE_TABLE_ADAPTER="true"
|
|
|
+export PYTHONPATH="$(pwd)/zhch/adapters:$PYTHONPATH" # 确保适配器模块可被导入
|
|
|
+
|
|
|
+# 预加载适配器模块 (会设置 import hook)
|
|
|
+python3 -c "import table_recognition_adapter" 2>/dev/null || true
|
|
|
+
|
|
|
+# 启动服务 (适配器会在 pipeline_v2 导入时自动激活)
|
|
|
+nohup paddlex --serve \
|
|
|
+ --port 8111 \
|
|
|
+ --device "gpu:3" \
|
|
|
+ --pipeline "zhch/my_config/table_recognition_v2.yaml" \
|
|
|
+ > "service.log" 2>&1 &
|
|
|
+```
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 推荐方案对比
|
|
|
+
|
|
|
+| 方案 | 优势 | 劣势 | 推荐度 |
|
|
|
+|------|------|------|--------|
|
|
|
+| **sitecustomize.py** | ✅ Python官方机制<br>✅ 对所有进程生效 | ⚠️ 可能影响其他Python程序 | ⭐⭐⭐⭐ |
|
|
|
+| **包装脚本** | ✅ 同一进程<br>✅ 100%生效 | ⚠️ 需要维护包装脚本 | ⭐⭐⭐⭐⭐ |
|
|
|
+| **import hook** | ✅ 延迟激活<br>✅ 精确控制 | ⚠️ 仍需预加载 | ⭐⭐⭐ |
|
|
|
+
|
|
|
+**我推荐使用方案2(包装脚本)**,因为它最可靠且易于调试。
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 更新 README.md
|
|
|
+
|
|
|
+在您的 README.md 中添加:
|
|
|
+
|
|
|
+````markdown
|
|
|
+# 表格识别服务启动(带增强适配器)
|
|
|
+
|
|
|
+## 服务端启动
|
|
|
+
|
|
|
+```bash
|
|
|
+cd ~/zhch/PaddleX/zhch
|
|
|
+conda activate paddle
|
|
|
+export PADDLE_PDX_MODEL_SOURCE="bos"
|
|
|
+
|
|
|
+# 使用包装脚本启动服务(自动激活适配器)
|
|
|
+chmod +x start_table_recognition_service.sh
|
|
|
+./start_table_recognition_service.sh 8111 "gpu:3" "zhch/my_config/table_recognition_v2.yaml"
|
|
|
+
|
|
|
+# 查看日志
|
|
|
+tail -f table_recognition_service_*.log
|
|
|
+```
|
|
|
+
|
|
|
+## 客户端调用
|
|
|
+
|
|
|
+```bash
|
|
|
+# macOS 客户端
|
|
|
+python table_recognition_v2_client.py \
|
|
|
+ --input_file "/Users/zhch158/workspace/data/至远彩色印刷工业有限公司/2023年度报告母公司.pdf" \
|
|
|
+ --output_dir "./output" \
|
|
|
+ --api_url "http://10.192.72.11:8111/table-recognition-v2"
|
|
|
+```
|
|
|
+
|
|
|
+## 禁用适配器
|
|
|
+
|
|
|
+```bash
|
|
|
+# 方式1: 不设置环境变量
|
|
|
+unset PADDLEX_ENABLE_TABLE_ADAPTER
|
|
|
+./start_table_recognition_service.sh
|
|
|
+
|
|
|
+# 方式2: 直接使用原始命令
|
|
|
+nohup paddlex --serve --port 8111 --device "gpu:3" --pipeline "zhch/my_config/table_recognition_v2.yaml" &
|
|
|
+```
|
|
|
+````
|
|
|
+
|
|
|
+现在适配器会在**同一个进程**中生效了!🎯
|