Explorar el Código

refactor(pre_proc): adjust IOU threshold for character overlap detection

- Modified the IOU threshold in ocr_span_list_modify.py from 0.9 to 0.35
- This change aims to improve the detection of overlapping characters in OCR processed PDFs
myhloli hace 10 meses
padre
commit
f37b14bc83
Se han modificado 1 ficheros con 1 adiciones y 1 borrados
  1. 1 1
      magic_pdf/pre_proc/ocr_span_list_modify.py

+ 1 - 1
magic_pdf/pre_proc/ocr_span_list_modify.py

@@ -36,7 +36,7 @@ def remove_overlaps_low_confidence_spans(spans):
 def check_chars_is_overlap_in_span(chars):
     for i in range(len(chars)):
         for j in range(i + 1, len(chars)):
-            if calculate_iou(chars[i]['bbox'], chars[j]['bbox']) > 0.9:
+            if calculate_iou(chars[i]['bbox'], chars[j]['bbox']) > 0.35:
                 return True
     return False