Parcourir la source

fix(pdf-extract-kit): ensure table extraction success with additional ending conditionAdd an additional condition to determine the success of table extraction by checking
if the latex_code ends with 'end{table}'. This extends the validation to cover table
environments that may not strictly end with 'end{tabular}', thus improving the robustnessof table recognition processing.

myhloli il y a 1 an
Parent
commit
334ccac24e
1 fichiers modifiés avec 2 ajouts et 1 suppressions
  1. 2 1
      magic_pdf/model/pdf_extract_kit.py

+ 2 - 1
magic_pdf/model/pdf_extract_kit.py

@@ -285,7 +285,8 @@ class CustomPEKModel:
                 if run_time > self.table_max_time:
                     logger.warning(f"------------table recognition processing exceeds max time {self.table_max_time}s----------")
                 # 判断是否返回正常
-                if latex_code and latex_code.strip().endswith('end{tabular}'):
+                expected_ending = latex_code.strip().endswith('end{tabular}') or latex_code.strip().endswith('end{table}')
+                if latex_code and expected_ending:
                     res["latex"] = latex_code
                 else:
                     logger.warning(f"------------table recognition processing fails----------")