zhch158_admin b433d1211c feat: Add PaddleX core modules for document processing		6 сар өмнө
..
PP-StructureV3-RT-DETR-H_layout_17cls.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
PP-StructureV3-zhch.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
PP-StructureV3.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
PaddleOCR-VL-Client-RT-DETR-H_layout_17cls.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
PaddleOCR-VL-Client.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
PaddleOCR-VL.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
README.md	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
layout_parsing.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
table_recognition_v2-zhch.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө
table_recognition_v2.yaml	b433d1211c feat: Add PaddleX core modules for document processing	6 сар өмнө

PaddleX Pipeline 配置文件

本目录包含 PaddleX 的 pipeline 配置文件，用于配置不同的文档解析 pipeline。

配置文件分类

PaddleOCR-VL 相关配置

PaddleOCR-VL.yaml: 基础 PaddleOCR-VL pipeline 配置
PaddleOCR-VL-Client.yaml: PaddleOCR-VL 客户端配置
PaddleOCR-VL-Client-RT-DETR-H_layout_17cls.yaml: 使用 RT-DETR-H 布局检测模型的 PaddleOCR-VL 配置（17 类布局）

PP-StructureV3 相关配置

PP-StructureV3.yaml: 基础 PP-StructureV3 pipeline 配置
PP-StructureV3-zhch.yaml: 自定义的 PP-StructureV3 配置（zhch 版本）
PP-StructureV3-RT-DETR-H_layout_17cls.yaml: 使用 RT-DETR-H 布局检测模型的 PP-StructureV3 配置（17 类布局）

其他配置

layout_parsing.yaml: 布局解析配置
table_recognition_v2.yaml: 表格识别 V2 配置
table_recognition_v2-zhch.yaml: 自定义的表格识别 V2 配置（zhch 版本）

使用方法

在命令行中使用

# 使用相对路径（从工具目录运行）
python main.py --input document.pdf --output_dir ./output \
  --pipeline ../paddle_common/config/PaddleOCR-VL-Client-RT-DETR-H_layout_17cls.yaml

# 使用绝对路径
python main.py --input document.pdf --output_dir ./output \
  --pipeline /path/to/ocr_platform/ocr_tools/paddle_common/config/PP-StructureV3-zhch.yaml

在代码中使用

from pathlib import Path

# 获取配置文件路径
config_dir = Path(__file__).parent / "config"
config_path = config_dir / "PaddleOCR-VL-Client-RT-DETR-H_layout_17cls.yaml"

# 使用配置文件初始化 pipeline
processor = PaddleXProcessor(
    pipeline_name=str(config_path),
    device="gpu:0"
)

配置文件说明

PaddleOCR-VL vs PP-StructureV3

PaddleOCR-VL: 基于视觉语言模型的文档解析，专注于视觉理解
PP-StructureV3: 更全面的文档结构分析，包括表格、公式、图表等识别

RT-DETR-H 布局检测模型

使用 RT-DETR-H 作为布局检测模型，支持 17 类布局检测：

abstract, algorithm, aside_text, chart, content, formula
doc_title, figure_title, footer, footnote, formula_number
header, image, number, paragraph_title, reference
reference_content, seal, table, text, vision_footnote

自定义配置（zhch 版本）

带有 -zhch 后缀的配置文件是自定义版本，可能包含：

调整的阈值参数
优化的模型配置
特定的功能开关设置

注意事项

路径引用：配置文件路径可以是相对路径或绝对路径
Pipeline 名称：也可以直接使用 pipeline 名称（如 PaddleOCR-VL），无需指定配置文件
设备配置：某些配置可能需要特定的设备（GPU/CPU）支持
模型文件：确保配置文件中指定的模型文件已正确安装

README.md

PaddleX Pipeline 配置文件

配置文件分类

PaddleOCR-VL 相关配置

PP-StructureV3 相关配置

其他配置

使用方法

在命令行中使用

在代码中使用

配置文件说明

PaddleOCR-VL vs PP-StructureV3

RT-DETR-H 布局检测模型

自定义配置（zhch 版本）

注意事项

相关工具