Răsfoiți Sursa

refactor(pdf): adjust span filling threshold in block construction

Increased the threshold for filling spans in blocks from 0.3 to 0.5 to improve the accuracy of block formation. This change helps refine the grouping of spans into blocks, potentially enhancing the overall structure and readability of the PDF content.
myhloli 1 an în urmă
părinte
comite
7e301b849b
1 a modificat fișierele cu 1 adăugiri și 1 ștergeri
  1. 1 1
      magic_pdf/pdf_parse_union_core_v2.py

+ 1 - 1
magic_pdf/pdf_parse_union_core_v2.py

@@ -360,7 +360,7 @@ def parse_page_core(pdf_docs, magic_model, page_id, pdf_bytes_md5, imageWriter,
                                                need_drop, drop_reason)
                                                need_drop, drop_reason)
 
 
     '''将span填入blocks中'''
     '''将span填入blocks中'''
-    block_with_spans, spans = fill_spans_in_blocks(all_bboxes, spans, 0.3)
+    block_with_spans, spans = fill_spans_in_blocks(all_bboxes, spans, 0.5)
 
 
     '''对block进行fix操作'''
     '''对block进行fix操作'''
     fix_blocks = fix_block_spans(block_with_spans, img_blocks, table_blocks)
     fix_blocks = fix_block_spans(block_with_spans, img_blocks, table_blocks)