Bläddra i källkod

fix(pdf_parse): improve line stop flag detection accuracy

- Add an additional condition to the line stop flag check
- Ensure character is to the right of the span's left boundary
- This change helps reduce false positives in line stop detection
myhloli 1 år sedan
förälder
incheckning
ae3b0a1e60
1 ändrade filer med 1 tillägg och 0 borttagningar
  1. 1 0
      magic_pdf/pdf_parse_union_core_v2.py

+ 1 - 0
magic_pdf/pdf_parse_union_core_v2.py

@@ -151,6 +151,7 @@ def calculate_char_in_span(char_bbox, span_bbox, char_is_line_stop_flag):
         if char_is_line_stop_flag:
             if (
                 (span_bbox[2] - span_height) < char_bbox[0] < span_bbox[2]
+                and char_center_x > span_bbox[0]
                 and span_bbox[1] < char_center_y < span_bbox[3]
                 and abs(char_center_y - span_center_y) < span_height / 4
             ):