瀏覽代碼

fix: remove unnecessary fields from block data in para_split.py

myhloli 4 月之前
父節點
當前提交
fefe2d36d4
共有 1 個文件被更改,包括 4 次插入0 次删除
  1. 4 0
      mineru/backend/pipeline/para_split.py

+ 4 - 0
mineru/backend/pipeline/para_split.py

@@ -368,6 +368,10 @@ def para_split(page_info_list):
             if block['page_num'] == page_info['page_idx']:
                 page_info['para_blocks'].append(block)
 
+            # 从block中删除不需要的page_num和page_size字段
+            del block['page_num']
+            del block['page_size']
+
 
 if __name__ == '__main__':
     input_blocks = []