浏览代码

refactor(pre_proc): adjust IOU threshold for character overlap detection

- Modified the IOU threshold in ocr_span_list_modify.py from 0.9 to 0.35
- This change aims to improve the detection of overlapping characters in OCR processed PDFs
myhloli 10 月之前
父节点
当前提交
f37b14bc83
共有 1 个文件被更改,包括 1 次插入1 次删除
  1. 1 1
      magic_pdf/pre_proc/ocr_span_list_modify.py

+ 1 - 1
magic_pdf/pre_proc/ocr_span_list_modify.py

@@ -36,7 +36,7 @@ def remove_overlaps_low_confidence_spans(spans):
 def check_chars_is_overlap_in_span(chars):
     for i in range(len(chars)):
         for j in range(i + 1, len(chars)):
-            if calculate_iou(chars[i]['bbox'], chars[j]['bbox']) > 0.9:
+            if calculate_iou(chars[i]['bbox'], chars[j]['bbox']) > 0.35:
                 return True
     return False