3 tygodni temu · 283cba100c
--- a/zhch/Model推理参数说明.md
+++ b/zhch/Model推理参数说明.md
@@ -0,0 +1,287 @@
 
				+# layout_nms
			
 
				+
			
 
				+根据代码分析，`layout_nms` 是 PaddleX 中**版面检测模型的后处理参数**，用于过滤重叠的检测框。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 🎯 核心定义
			
 
				+
			
 
				+**`layout_nms`**: **Layout-aware Non-Maximum Suppression**（版面感知的非极大值抑制）
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 📊 作用说明
			
 
				+
			
 
				+### 1. **基本功能**
			
 
				+
			
 
				+```python
			
 
				+# paddlex/inference/models/object_detection/processors.py L736-765
			
 
				+def apply(
			
 
				+    self,
			
 
				+    boxes: ndarray,
			
 
				+    threshold: Union[float, dict],
			
 
				+    layout_nms: Optional[bool],  # 👈 关键参数
			
 
				+    layout_unclip_ratio: Optional[Union[float, Tuple[float, float], dict]],
			
 
				+    layout_merge_bboxes_mode: Optional[Union[str, dict]],
			
 
				+) -> Boxes:
			
 
				+    """后处理检测结果"""
			
 
				+    
			
 
				+    # ... threshold 过滤 ...
			
 
				+    
			
 
				+    if layout_nms:  # 👈 启用 NMS
			
 
				+        selected_indices = nms(boxes[:, :6], iou_same=0.6, iou_diff=0.98)
			
 
				+        boxes = np.array(boxes[selected_indices])
			
 
				+```
			
 
				+
			
 
				+**功能**: 当多个检测框重叠时，保留置信度最高的框，删除冗余框。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 2. **NMS 实现逻辑**
			
 
				+
			
 
				+```python
			
 
				+# paddlex/inference/models/object_detection/processors.py L614-650
			
 
				+def nms(boxes, iou_same=0.6, iou_diff=0.95):
			
 
				+    """
			
 
				+    版面感知的 NMS
			
 
				+    
			
 
				+    Args:
			
 
				+        boxes: 检测框 [class_id, score, x1, y1, x2, y2]
			
 
				+        iou_same: 同类别框的 IoU 阈值（默认 0.6）
			
 
				+        iou_diff: 不同类别框的 IoU 阈值（默认 0.95）
			
 
				+    
			
 
				+    Returns:
			
 
				+        保留的框索引
			
 
				+    """
			
 
				+    scores = boxes[:, 1]
			
 
				+    indices = np.argsort(scores)[::-1]  # 按置信度降序排列
			
 
				+    selected_boxes = []
			
 
				+
			
 
				+    while len(indices) > 0:
			
 
				+        current = indices[0]
			
 
				+        current_box = boxes[current]
			
 
				+        current_class = current_box[0]
			
 
				+        current_coords = current_box[2:]
			
 
				+
			
 
				+        selected_boxes.append(current)
			
 
				+        indices = indices[1:]
			
 
				+
			
 
				+        filtered_indices = []
			
 
				+        for i in indices:
			
 
				+            box = boxes[i]
			
 
				+            box_class = box[0]
			
 
				+            box_coords = box[2:]
			
 
				+            iou_value = iou(current_coords, box_coords)
			
 
				+            
			
 
				+            # 👇 核心逻辑：同类别用低阈值，不同类别用高阈值
			
 
				+            threshold = iou_same if current_class == box_class else iou_diff
			
 
				+
			
 
				+            # 如果 IoU < 阈值，保留该框
			
 
				+            if iou_value < threshold:
			
 
				+                filtered_indices.append(i)
			
 
				+
			
 
				+        indices = filtered_indices
			
 
				+
			
 
				+    return selected_boxes
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 3. **关键特性：双阈值策略**
			
 
				+
			
 
				+| 场景 | IoU 阈值 | 说明 |
			
 
				+|------|---------|------|
			
 
				+| **同类别框重叠** | `iou_same=0.6` | 较严格，IoU > 0.6 则删除低分框 |
			
 
				+| **不同类别框重叠** | `iou_diff=0.98` | 较宽松，IoU > 0.98 才删除（允许不同类别框共存） |
			
 
				+
			
 
				+**示例**:
			
 
				+
			
 
				+```text
			
 
				+场景 1: 两个 "text" 框重叠
			
 
				+┌─────────────┐
			
 
				+│  text (0.9) │
			
 
				+│   ┌─────────┼──┐
			
 
				+│   │ overlap │  │
			
 
				+└───┼─────────┘  │
			
 
				+    │ text (0.7) │
			
 
				+    └────────────┘
			
 
				+如果 IoU > 0.6 → 删除 text (0.7)
			
 
				+
			
 
				+场景 2: "text" 和 "table" 框重叠
			
 
				+┌─────────────┐
			
 
				+│  text (0.9) │
			
 
				+│   ┌─────────┼──┐
			
 
				+│   │ overlap │  │
			
 
				+└───┼─────────┘  │
			
 
				+    │ table(0.8) │
			
 
				+    └────────────┘
			
 
				+如果 IoU < 0.98 → 两者都保留（不同类别可以重叠）
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 🔧 使用方式
			
 
				+
			
 
				+### 方式 1: 在 `predict` 时指定
			
 
				+
			
 
				+```python
			
 
				+from paddlex import create_model
			
 
				+
			
 
				+model = create_model(model_name="PP-DocLayout_plus-L")
			
 
				+
			
 
				+# 启用 layout_nms
			
 
				+output = model.predict(
			
 
				+    "input.png", 
			
 
				+    layout_nms=True  # 👈 启用 NMS
			
 
				+)
			
 
				+```
			
 
				+
			
 
				+### 方式 2: 在配置文件中设置
			
 
				+
			
 
				+```yaml
			
 
				+# zhch/my_config/PaddleOCR-VL-Client_debug.yaml
			
 
				+SubModules:
			
 
				+  LayoutDetection:
			
 
				+    module_name: layout_detection
			
 
				+    model_name: PP-DocLayoutV2
			
 
				+    layout_nms: True  # 👈 启用 NMS
			
 
				+    threshold: 0.5
			
 
				+```
			
 
				+
			
 
				+### 方式 3: 在管道中传递
			
 
				+
			
 
				+```python
			
 
				+from paddlex import create_pipeline
			
 
				+
			
 
				+pipeline = create_pipeline(pipeline="layout_parsing")
			
 
				+
			
 
				+output = pipeline.predict(
			
 
				+    "input.png",
			
 
				+    layout_nms=True  # 👈 启用 NMS
			
 
				+)
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 📈 效果对比
			
 
				+
			
 
				+### 禁用 `layout_nms=False` (默认)
			
 
				+
			
 
				+```
			
 
				+检测结果:
			
 
				+┌──────────────────┐
			
 
				+│ text (score=0.95)│
			
 
				+│  ┌───────────────┼─────┐
			
 
				+│  │   overlap     │     │
			
 
				+└──┼───────────────┘     │
			
 
				+   │ text (score=0.85)   │  ← 冗余框
			
 
				+   └─────────────────────┘
			
 
				+```
			
 
				+
			
 
				+**问题**: 存在多个重叠的检测框，造成干扰。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 启用 `layout_nms=True`
			
 
				+
			
 
				+```
			
 
				+检测结果:
			
 
				+┌──────────────────┐
			
 
				+│ text (score=0.95)│  ← 仅保留高分框
			
 
				+└──────────────────┘
			
 
				+```
			
 
				+
			
 
				+**优势**: 自动过滤低分重叠框，输出更干净。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 🎯 实际应用场景
			
 
				+
			
 
				+### 场景 1: 版面检测后有重叠框
			
 
				+
			
 
				+**问题**:
			
 
				+```python
			
 
				+layout_det_res = {
			
 
				+    "boxes": [
			
 
				+        {"label": "text", "score": 0.92, "coordinate": [100, 100, 400, 200]},
			
 
				+        {"label": "text", "score": 0.78, "coordinate": [110, 110, 410, 210]},  # 重叠
			
 
				+    ]
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+**解决**:
			
 
				+```python
			
 
				+# 启用 NMS 后
			
 
				+output = model.predict("input.png", layout_nms=True)
			
 
				+# 仅保留 score=0.92 的框
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+### 场景 2: 表格和文本重叠（需要保留）
			
 
				+
			
 
				+**需求**: 表格内有文字，两者需要同时保留
			
 
				+
			
 
				+```python
			
 
				+layout_det_res = {
			
 
				+    "boxes": [
			
 
				+        {"label": "table", "score": 0.88, "coordinate": [50, 50, 500, 300]},
			
 
				+        {"label": "text", "score": 0.85, "coordinate": [60, 60, 200, 100]},  # 表格内的文字
			
 
				+    ]
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+**效果**:
			
 
				+```python
			
 
				+# 启用 NMS 后，由于是不同类别且 IoU < 0.98，两者都保留 ✅
			
 
				+output = model.predict("input.png", layout_nms=True)
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 🔍 源码位置
			
 
				+
			
 
				+| 文件 | 关键代码 |
			
 
				+|------|---------|
			
 
				+| [`paddlex/inference/models/object_detection/processors.py`]processors.py ) | `nms()` 函数（L614-650） |
			
 
				+| [`paddlex/inference/models/object_detection/predictor.py`]predictor.py ) | `layout_nms` 参数定义（L111） |
			
 
				+| [`paddlex/inference/pipelines/layout_parsing/pipeline_v2.py`]pipeline_v2.py ) | 管道中使用（L1010） |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## ⚙️ 默认值
			
 
				+
			
 
				+| 位置 | 默认值 |
			
 
				+|------|--------|
			
 
				+| **模型配置** | `None` (不启用) |
			
 
				+| **管道配置** | `True` (启用) |
			
 
				+| **手动调用** | 需显式指定 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 📋 总结
			
 
				+
			
 
				+| 维度 | 说明 |
			
 
				+|------|------|
			
 
				+| **定义** | 版面感知的非极大值抑制 |
			
 
				+| **作用** | 过滤重叠的检测框 |
			
 
				+| **核心逻辑** | 双阈值策略（同类别严格，不同类别宽松） |
			
 
				+| **适用场景** | 版面检测、目标检测 |
			
 
				+| **是否必需** | 否，但推荐启用 |
			
 
				+| **默认值** | 管道中默认 `True`，单模型默认 `None` |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 🚀 推荐配置
			
 
				+
			
 
				+```python
			
 
				+# 最佳实践
			
 
				+model.predict(
			
 
				+    "input.png",
			
 
				+    threshold=0.5,        # 置信度阈值
			
 
				+    layout_nms=True,      # 👈 启用 NMS，过滤重叠框
			
 
				+    layout_unclip_ratio=1.0,  # 边界框扩展比例
			
 
				+)
			
 
				+```
			
 
				+
			
 
				+**建议**: 在生产环境中**始终启用** `layout_nms=True`，以获得更干净的检测结果。