zhch158_admin 20d936e629 feat: 新增 DiT 支持模块及其核心功能		5 месяцев назад
..
configs	20d936e629 feat: 新增 DiT 支持模块及其核心功能	5 месяцев назад
ditod	20d936e629 feat: 新增 DiT 支持模块及其核心功能	5 месяцев назад
README.md	20d936e629 feat: 新增 DiT 支持模块及其核心功能	5 месяцев назад

DiT 支持模块

本目录包含 DiT (Document Image Transformer) 布局检测所需的核心代码和配置文件。

目录结构

dit_support/
├── ditod/                    # DiT 核心模块
│   ├── __init__.py          # 模块导出（仅推理必需）
│   ├── config.py            # 配置扩展（add_vit_config）
│   ├── backbone.py          # ViT backbone 实现
│   ├── beit.py              # BEiT/DIT 模型定义
│   └── deit.py              # DeiT 模型定义（可选）
└── configs/                  # 配置文件
    ├── Base-RCNN-FPN.yaml   # 基础配置
    └── cascade/
        └── cascade_dit_large.yaml  # Cascade R-CNN + DiT-large 配置

使用方法

在 universal_doc_parser 中使用 DiT 布局检测：

from models.adapters import get_layout_detector

# 配置 DiT 检测器
config = {
    'module': 'dit',
    'config_file': 'dit_support/configs/cascade/cascade_dit_large.yaml',
    'model_weights': 'https://huggingface.co/HYPJUDY/dit/resolve/main/dit-fts/publaynet_dit-l_cascade.pth',
    'device': 'cpu',  # 或 'cuda'
    'conf': 0.3,
    'remove_overlap': True,
    'iou_threshold': 0.8,
    'overlap_ratio_threshold': 0.8,
}

# 创建检测器
detector = get_layout_detector(config)
detector.initialize()

# 检测布局
import cv2
img = cv2.imread('image.jpg')
results = detector.detect(img)

# 清理
detector.cleanup()

依赖包

需要安装以下 Python 包：

# 1. PyTorch（必须先安装）
pip install torch torchvision

# 2. detectron2
# Mac M4 Pro / Apple Silicon:
CC=clang CXX=clang++ ARCHFLAGS="-arch arm64" pip install --no-build-isolation 'git+https://github.com/facebookresearch/detectron2.git'

# Linux (CPU):
pip install 'git+https://github.com/facebookresearch/detectron2.git'

# Linux (CUDA):
pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'

# 3. timm（Vision Transformer 模型库）
pip install timm

# 4. 基础依赖
pip install numpy opencv-python Pillow einops

迁移说明

本模块是从 unilm/dit/object_detection/ 迁移的最小版本，仅包含推理必需的代码：

✅ 已迁移：ditod 核心模块（5个文件）、配置文件（2个）
❌ 未迁移：训练相关代码（dataset_mapper.py, mytrainer.py 等）、评估代码（icdar_evaluation.py, table_evaluation/）

注意事项

路径问题：确保 dit_support 目录在 Python 路径中（适配器会自动处理）
模型权重：首次运行会自动从 HuggingFace 下载，需要网络连接
PyTorch 2.6+：代码中已包含兼容性修复
重叠框处理：默认启用，可在配置中关闭或调整阈值

README.md

DiT 支持模块

目录结构

使用方法

依赖包

迁移说明

注意事项