Forráskód Böngészése

refactor: modify bbox processing for layout separation

- Remove overlap between bboxes for block separation
- Sort bboxes by combined x and y coordinates for better layout handling
- Comment out previous overlap removal function
myhloli 11 hónapja
szülő
commit
b3127233f0
1 módosított fájl, 2 hozzáadás és 2 törlés
  1. 2 2
      magic_pdf/pre_proc/ocr_detect_all_bboxes.py

+ 2 - 2
magic_pdf/pre_proc/ocr_detect_all_bboxes.py

@@ -117,8 +117,8 @@ def ocr_prepare_bboxes_for_layout_split_v2(
     all_bboxes = remove_overlaps_min_blocks(all_bboxes)
     all_discarded_blocks = remove_overlaps_min_blocks(all_discarded_blocks)
     """将剩余的bbox做分离处理,防止后面分layout时出错"""
-    all_bboxes, drop_reasons = remove_overlap_between_bbox_for_block(all_bboxes)
-
+    # all_bboxes, drop_reasons = remove_overlap_between_bbox_for_block(all_bboxes)
+    all_bboxes.sort(key=lambda x: x[0]+x[1])
     return all_bboxes, all_discarded_blocks