|
@@ -0,0 +1,287 @@
|
|
|
|
|
+# layout_nms
|
|
|
|
|
+
|
|
|
|
|
+根据代码分析,`layout_nms` 是 PaddleX 中**版面检测模型的后处理参数**,用于过滤重叠的检测框。
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 🎯 核心定义
|
|
|
|
|
+
|
|
|
|
|
+**`layout_nms`**: **Layout-aware Non-Maximum Suppression**(版面感知的非极大值抑制)
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 📊 作用说明
|
|
|
|
|
+
|
|
|
|
|
+### 1. **基本功能**
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+# paddlex/inference/models/object_detection/processors.py L736-765
|
|
|
|
|
+def apply(
|
|
|
|
|
+ self,
|
|
|
|
|
+ boxes: ndarray,
|
|
|
|
|
+ threshold: Union[float, dict],
|
|
|
|
|
+ layout_nms: Optional[bool], # 👈 关键参数
|
|
|
|
|
+ layout_unclip_ratio: Optional[Union[float, Tuple[float, float], dict]],
|
|
|
|
|
+ layout_merge_bboxes_mode: Optional[Union[str, dict]],
|
|
|
|
|
+) -> Boxes:
|
|
|
|
|
+ """后处理检测结果"""
|
|
|
|
|
+
|
|
|
|
|
+ # ... threshold 过滤 ...
|
|
|
|
|
+
|
|
|
|
|
+ if layout_nms: # 👈 启用 NMS
|
|
|
|
|
+ selected_indices = nms(boxes[:, :6], iou_same=0.6, iou_diff=0.98)
|
|
|
|
|
+ boxes = np.array(boxes[selected_indices])
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**功能**: 当多个检测框重叠时,保留置信度最高的框,删除冗余框。
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 2. **NMS 实现逻辑**
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+# paddlex/inference/models/object_detection/processors.py L614-650
|
|
|
|
|
+def nms(boxes, iou_same=0.6, iou_diff=0.95):
|
|
|
|
|
+ """
|
|
|
|
|
+ 版面感知的 NMS
|
|
|
|
|
+
|
|
|
|
|
+ Args:
|
|
|
|
|
+ boxes: 检测框 [class_id, score, x1, y1, x2, y2]
|
|
|
|
|
+ iou_same: 同类别框的 IoU 阈值(默认 0.6)
|
|
|
|
|
+ iou_diff: 不同类别框的 IoU 阈值(默认 0.95)
|
|
|
|
|
+
|
|
|
|
|
+ Returns:
|
|
|
|
|
+ 保留的框索引
|
|
|
|
|
+ """
|
|
|
|
|
+ scores = boxes[:, 1]
|
|
|
|
|
+ indices = np.argsort(scores)[::-1] # 按置信度降序排列
|
|
|
|
|
+ selected_boxes = []
|
|
|
|
|
+
|
|
|
|
|
+ while len(indices) > 0:
|
|
|
|
|
+ current = indices[0]
|
|
|
|
|
+ current_box = boxes[current]
|
|
|
|
|
+ current_class = current_box[0]
|
|
|
|
|
+ current_coords = current_box[2:]
|
|
|
|
|
+
|
|
|
|
|
+ selected_boxes.append(current)
|
|
|
|
|
+ indices = indices[1:]
|
|
|
|
|
+
|
|
|
|
|
+ filtered_indices = []
|
|
|
|
|
+ for i in indices:
|
|
|
|
|
+ box = boxes[i]
|
|
|
|
|
+ box_class = box[0]
|
|
|
|
|
+ box_coords = box[2:]
|
|
|
|
|
+ iou_value = iou(current_coords, box_coords)
|
|
|
|
|
+
|
|
|
|
|
+ # 👇 核心逻辑:同类别用低阈值,不同类别用高阈值
|
|
|
|
|
+ threshold = iou_same if current_class == box_class else iou_diff
|
|
|
|
|
+
|
|
|
|
|
+ # 如果 IoU < 阈值,保留该框
|
|
|
|
|
+ if iou_value < threshold:
|
|
|
|
|
+ filtered_indices.append(i)
|
|
|
|
|
+
|
|
|
|
|
+ indices = filtered_indices
|
|
|
|
|
+
|
|
|
|
|
+ return selected_boxes
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 3. **关键特性:双阈值策略**
|
|
|
|
|
+
|
|
|
|
|
+| 场景 | IoU 阈值 | 说明 |
|
|
|
|
|
+|------|---------|------|
|
|
|
|
|
+| **同类别框重叠** | `iou_same=0.6` | 较严格,IoU > 0.6 则删除低分框 |
|
|
|
|
|
+| **不同类别框重叠** | `iou_diff=0.98` | 较宽松,IoU > 0.98 才删除(允许不同类别框共存) |
|
|
|
|
|
+
|
|
|
|
|
+**示例**:
|
|
|
|
|
+
|
|
|
|
|
+```text
|
|
|
|
|
+场景 1: 两个 "text" 框重叠
|
|
|
|
|
+┌─────────────┐
|
|
|
|
|
+│ text (0.9) │
|
|
|
|
|
+│ ┌─────────┼──┐
|
|
|
|
|
+│ │ overlap │ │
|
|
|
|
|
+└───┼─────────┘ │
|
|
|
|
|
+ │ text (0.7) │
|
|
|
|
|
+ └────────────┘
|
|
|
|
|
+如果 IoU > 0.6 → 删除 text (0.7)
|
|
|
|
|
+
|
|
|
|
|
+场景 2: "text" 和 "table" 框重叠
|
|
|
|
|
+┌─────────────┐
|
|
|
|
|
+│ text (0.9) │
|
|
|
|
|
+│ ┌─────────┼──┐
|
|
|
|
|
+│ │ overlap │ │
|
|
|
|
|
+└───┼─────────┘ │
|
|
|
|
|
+ │ table(0.8) │
|
|
|
|
|
+ └────────────┘
|
|
|
|
|
+如果 IoU < 0.98 → 两者都保留(不同类别可以重叠)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 🔧 使用方式
|
|
|
|
|
+
|
|
|
|
|
+### 方式 1: 在 `predict` 时指定
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+from paddlex import create_model
|
|
|
|
|
+
|
|
|
|
|
+model = create_model(model_name="PP-DocLayout_plus-L")
|
|
|
|
|
+
|
|
|
|
|
+# 启用 layout_nms
|
|
|
|
|
+output = model.predict(
|
|
|
|
|
+ "input.png",
|
|
|
|
|
+ layout_nms=True # 👈 启用 NMS
|
|
|
|
|
+)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+### 方式 2: 在配置文件中设置
|
|
|
|
|
+
|
|
|
|
|
+```yaml
|
|
|
|
|
+# zhch/my_config/PaddleOCR-VL-Client_debug.yaml
|
|
|
|
|
+SubModules:
|
|
|
|
|
+ LayoutDetection:
|
|
|
|
|
+ module_name: layout_detection
|
|
|
|
|
+ model_name: PP-DocLayoutV2
|
|
|
|
|
+ layout_nms: True # 👈 启用 NMS
|
|
|
|
|
+ threshold: 0.5
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+### 方式 3: 在管道中传递
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+from paddlex import create_pipeline
|
|
|
|
|
+
|
|
|
|
|
+pipeline = create_pipeline(pipeline="layout_parsing")
|
|
|
|
|
+
|
|
|
|
|
+output = pipeline.predict(
|
|
|
|
|
+ "input.png",
|
|
|
|
|
+ layout_nms=True # 👈 启用 NMS
|
|
|
|
|
+)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 📈 效果对比
|
|
|
|
|
+
|
|
|
|
|
+### 禁用 `layout_nms=False` (默认)
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+检测结果:
|
|
|
|
|
+┌──────────────────┐
|
|
|
|
|
+│ text (score=0.95)│
|
|
|
|
|
+│ ┌───────────────┼─────┐
|
|
|
|
|
+│ │ overlap │ │
|
|
|
|
|
+└──┼───────────────┘ │
|
|
|
|
|
+ │ text (score=0.85) │ ← 冗余框
|
|
|
|
|
+ └─────────────────────┘
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**问题**: 存在多个重叠的检测框,造成干扰。
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 启用 `layout_nms=True`
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+检测结果:
|
|
|
|
|
+┌──────────────────┐
|
|
|
|
|
+│ text (score=0.95)│ ← 仅保留高分框
|
|
|
|
|
+└──────────────────┘
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**优势**: 自动过滤低分重叠框,输出更干净。
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 🎯 实际应用场景
|
|
|
|
|
+
|
|
|
|
|
+### 场景 1: 版面检测后有重叠框
|
|
|
|
|
+
|
|
|
|
|
+**问题**:
|
|
|
|
|
+```python
|
|
|
|
|
+layout_det_res = {
|
|
|
|
|
+ "boxes": [
|
|
|
|
|
+ {"label": "text", "score": 0.92, "coordinate": [100, 100, 400, 200]},
|
|
|
|
|
+ {"label": "text", "score": 0.78, "coordinate": [110, 110, 410, 210]}, # 重叠
|
|
|
|
|
+ ]
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**解决**:
|
|
|
|
|
+```python
|
|
|
|
|
+# 启用 NMS 后
|
|
|
|
|
+output = model.predict("input.png", layout_nms=True)
|
|
|
|
|
+# 仅保留 score=0.92 的框
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+### 场景 2: 表格和文本重叠(需要保留)
|
|
|
|
|
+
|
|
|
|
|
+**需求**: 表格内有文字,两者需要同时保留
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+layout_det_res = {
|
|
|
|
|
+ "boxes": [
|
|
|
|
|
+ {"label": "table", "score": 0.88, "coordinate": [50, 50, 500, 300]},
|
|
|
|
|
+ {"label": "text", "score": 0.85, "coordinate": [60, 60, 200, 100]}, # 表格内的文字
|
|
|
|
|
+ ]
|
|
|
|
|
+}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**效果**:
|
|
|
|
|
+```python
|
|
|
|
|
+# 启用 NMS 后,由于是不同类别且 IoU < 0.98,两者都保留 ✅
|
|
|
|
|
+output = model.predict("input.png", layout_nms=True)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 🔍 源码位置
|
|
|
|
|
+
|
|
|
|
|
+| 文件 | 关键代码 |
|
|
|
|
|
+|------|---------|
|
|
|
|
|
+| [`paddlex/inference/models/object_detection/processors.py`]processors.py ) | `nms()` 函数(L614-650) |
|
|
|
|
|
+| [`paddlex/inference/models/object_detection/predictor.py`]predictor.py ) | `layout_nms` 参数定义(L111) |
|
|
|
|
|
+| [`paddlex/inference/pipelines/layout_parsing/pipeline_v2.py`]pipeline_v2.py ) | 管道中使用(L1010) |
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## ⚙️ 默认值
|
|
|
|
|
+
|
|
|
|
|
+| 位置 | 默认值 |
|
|
|
|
|
+|------|--------|
|
|
|
|
|
+| **模型配置** | `None` (不启用) |
|
|
|
|
|
+| **管道配置** | `True` (启用) |
|
|
|
|
|
+| **手动调用** | 需显式指定 |
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 📋 总结
|
|
|
|
|
+
|
|
|
|
|
+| 维度 | 说明 |
|
|
|
|
|
+|------|------|
|
|
|
|
|
+| **定义** | 版面感知的非极大值抑制 |
|
|
|
|
|
+| **作用** | 过滤重叠的检测框 |
|
|
|
|
|
+| **核心逻辑** | 双阈值策略(同类别严格,不同类别宽松) |
|
|
|
|
|
+| **适用场景** | 版面检测、目标检测 |
|
|
|
|
|
+| **是否必需** | 否,但推荐启用 |
|
|
|
|
|
+| **默认值** | 管道中默认 `True`,单模型默认 `None` |
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## 🚀 推荐配置
|
|
|
|
|
+
|
|
|
|
|
+```python
|
|
|
|
|
+# 最佳实践
|
|
|
|
|
+model.predict(
|
|
|
|
|
+ "input.png",
|
|
|
|
|
+ threshold=0.5, # 置信度阈值
|
|
|
|
|
+ layout_nms=True, # 👈 启用 NMS,过滤重叠框
|
|
|
|
|
+ layout_unclip_ratio=1.0, # 边界框扩展比例
|
|
|
|
|
+)
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+**建议**: 在生产环境中**始终启用** `layout_nms=True`,以获得更干净的检测结果。
|