Forráskód Böngészése

refactor(ocr_dict_merge): add threshold parameter for line merging

- Add threshold parameter to merge_spans_to_line function
- Make threshold configurable for y-axis overlap check
- Improve flexibility and accuracy of line merging algorithm
myhloli 1 éve
szülő
commit
b9f78c9ba1
1 módosított fájl, 2 hozzáadás és 2 törlés
  1. 2 2
      magic_pdf/pre_proc/ocr_dict_merge.py

+ 2 - 2
magic_pdf/pre_proc/ocr_dict_merge.py

@@ -24,7 +24,7 @@ def line_sort_spans_by_left_to_right(lines):
     return line_objects
 
 
-def merge_spans_to_line(spans):
+def merge_spans_to_line(spans, threshold=0.6):
     if len(spans) == 0:
         return []
     else:
@@ -49,7 +49,7 @@ def merge_spans_to_line(spans):
                 continue
 
             # 如果当前的span与当前行的最后一个span在y轴上重叠,则添加到当前行
-            if __is_overlaps_y_exceeds_threshold(span['bbox'], current_line[-1]['bbox'], 0.5):
+            if __is_overlaps_y_exceeds_threshold(span['bbox'], current_line[-1]['bbox'], threshold):
                 current_line.append(span)
             else:
                 # 否则,开始新行