layout_nms

根据代码分析，layout_nms 是 PaddleX 中版面检测模型的后处理参数，用于过滤重叠的检测框。

🎯 核心定义

layout_nms: Layout-aware Non-Maximum Suppression（版面感知的非极大值抑制）

📊 作用说明

1. 基本功能

# paddlex/inference/models/object_detection/processors.py L736-765
def apply(
    self,
    boxes: ndarray,
    threshold: Union[float, dict],
    layout_nms: Optional[bool],  # 👈 关键参数
    layout_unclip_ratio: Optional[Union[float, Tuple[float, float], dict]],
    layout_merge_bboxes_mode: Optional[Union[str, dict]],
) -> Boxes:
    """后处理检测结果"""
    
    # ... threshold 过滤 ...
    
    if layout_nms:  # 👈 启用 NMS
        selected_indices = nms(boxes[:, :6], iou_same=0.6, iou_diff=0.98)
        boxes = np.array(boxes[selected_indices])

功能: 当多个检测框重叠时，保留置信度最高的框，删除冗余框。

2. NMS 实现逻辑

# paddlex/inference/models/object_detection/processors.py L614-650
def nms(boxes, iou_same=0.6, iou_diff=0.95):
    """
    版面感知的 NMS
    
    Args:
        boxes: 检测框 [class_id, score, x1, y1, x2, y2]
        iou_same: 同类别框的 IoU 阈值（默认 0.6）
        iou_diff: 不同类别框的 IoU 阈值（默认 0.95）
    
    Returns:
        保留的框索引
    """
    scores = boxes[:, 1]
    indices = np.argsort(scores)[::-1]  # 按置信度降序排列
    selected_boxes = []

    while len(indices) > 0:
        current = indices[0]
        current_box = boxes[current]
        current_class = current_box[0]
        current_coords = current_box[2:]

        selected_boxes.append(current)
        indices = indices[1:]

        filtered_indices = []
        for i in indices:
            box = boxes[i]
            box_class = box[0]
            box_coords = box[2:]
            iou_value = iou(current_coords, box_coords)
            
            # 👇 核心逻辑：同类别用低阈值，不同类别用高阈值
            threshold = iou_same if current_class == box_class else iou_diff

            # 如果 IoU < 阈值，保留该框
            if iou_value < threshold:
                filtered_indices.append(i)

        indices = filtered_indices

    return selected_boxes

3. 关键特性：双阈值策略

场景	IoU 阈值	说明
同类别框重叠	`iou_same=0.6`	较严格，IoU > 0.6 则删除低分框
不同类别框重叠	`iou_diff=0.98`	较宽松，IoU > 0.98 才删除（允许不同类别框共存）

示例:

场景 1: 两个 "text" 框重叠
┌─────────────┐
│  text (0.9) │
│   ┌─────────┼──┐
│   │ overlap │  │
└───┼─────────┘  │
    │ text (0.7) │
    └────────────┘
如果 IoU > 0.6 → 删除 text (0.7)

场景 2: "text" 和 "table" 框重叠
┌─────────────┐
│  text (0.9) │
│   ┌─────────┼──┐
│   │ overlap │  │
└───┼─────────┘  │
    │ table(0.8) │
    └────────────┘
如果 IoU < 0.98 → 两者都保留（不同类别可以重叠）

🔧 使用方式

方式 1: 在 `predict` 时指定

from paddlex import create_model

model = create_model(model_name="PP-DocLayout_plus-L")

# 启用 layout_nms
output = model.predict(
    "input.png", 
    layout_nms=True  # 👈 启用 NMS
)

方式 2: 在配置文件中设置

# zhch/my_config/PaddleOCR-VL-Client_debug.yaml
SubModules:
  LayoutDetection:
    module_name: layout_detection
    model_name: PP-DocLayoutV2
    layout_nms: True  # 👈 启用 NMS
    threshold: 0.5

方式 3: 在管道中传递

from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="layout_parsing")

output = pipeline.predict(
    "input.png",
    layout_nms=True  # 👈 启用 NMS
)

📈 效果对比

禁用 `layout_nms=False` (默认)

检测结果:
┌──────────────────┐
│ text (score=0.95)│
│  ┌───────────────┼─────┐
│  │   overlap     │     │
└──┼───────────────┘     │
   │ text (score=0.85)   │  ← 冗余框
   └─────────────────────┘

问题: 存在多个重叠的检测框，造成干扰。

启用 `layout_nms=True`

检测结果:
┌──────────────────┐
│ text (score=0.95)│  ← 仅保留高分框
└──────────────────┘

优势: 自动过滤低分重叠框，输出更干净。

🎯 实际应用场景

场景 1: 版面检测后有重叠框

问题:

layout_det_res = {
    "boxes": [
        {"label": "text", "score": 0.92, "coordinate": [100, 100, 400, 200]},
        {"label": "text", "score": 0.78, "coordinate": [110, 110, 410, 210]},  # 重叠
    ]
}

解决:

# 启用 NMS 后
output = model.predict("input.png", layout_nms=True)
# 仅保留 score=0.92 的框

场景 2: 表格和文本重叠（需要保留）

需求: 表格内有文字，两者需要同时保留

layout_det_res = {
    "boxes": [
        {"label": "table", "score": 0.88, "coordinate": [50, 50, 500, 300]},
        {"label": "text", "score": 0.85, "coordinate": [60, 60, 200, 100]},  # 表格内的文字
    ]
}

效果:

# 启用 NMS 后，由于是不同类别且 IoU < 0.98，两者都保留 ✅
output = model.predict("input.png", layout_nms=True)

🔍 源码位置

文件	关键代码
[`paddlex/inference/models/object_detection/processors.py`]processors.py )	`nms()` 函数（L614-650）
[`paddlex/inference/models/object_detection/predictor.py`]predictor.py )	`layout_nms` 参数定义（L111）
[`paddlex/inference/pipelines/layout_parsing/pipeline_v2.py`]pipeline_v2.py )	管道中使用（L1010）

⚙️ 默认值

位置	默认值
模型配置	`None` (不启用)
管道配置	`True` (启用)
手动调用	需显式指定

📋 总结

维度	说明
定义	版面感知的非极大值抑制
作用	过滤重叠的检测框
核心逻辑	双阈值策略（同类别严格，不同类别宽松）
适用场景	版面检测、目标检测
是否必需	否，但推荐启用
默认值	管道中默认 `True`，单模型默认 `None`

🚀 推荐配置

# 最佳实践
model.predict(
    "input.png",
    threshold=0.5,        # 置信度阈值
    layout_nms=True,      # 👈 启用 NMS，过滤重叠框
    layout_unclip_ratio=1.0,  # 边界框扩展比例
)

建议: 在生产环境中始终启用 layout_nms=True，以获得更干净的检测结果。

Model推理参数说明.md 7.1 KB Istoric Crud