Explorar o código

fix(ocr_mkcontent): expand para_to_standard_format_v2 to handle list and index blocks

- Modified the condition to include List and Index block types- This change enhances the function's capability to process different paragraph types
myhloli hai 1 ano
pai
achega
644085760b
Modificáronse 1 ficheiros con 1 adicións e 1 borrados
  1. 1 1
      magic_pdf/dict2md/ocr_mkcontent.py

+ 1 - 1
magic_pdf/dict2md/ocr_mkcontent.py

@@ -162,7 +162,7 @@ def merge_para_with_text(para_block):
 def para_to_standard_format_v2(para_block, img_buket_path, page_idx, drop_reason=None):
     para_type = para_block['type']
     para_content = {}
-    if para_type == BlockType.Text:
+    if para_type in [BlockType.Text, BlockType.List, BlockType.Index]:
         para_content = {
             'type': 'text',
             'text': merge_para_with_text(para_block),