Преглед на файлове

fix(ocr): handle inline equations consistently with text content

- Include InlineEquation in the condition for handling text content
- Remove separate block for InlineEquation processing
- Ensures consistent handling of inline equations and text, improving content formatting
myhloli преди 1 година
родител
ревизия
87b9eeee59
променени са 1 файла, в които са добавени 1 реда и са изтрити 3 реда
  1. 1 3
      magic_pdf/dict2md/ocr_mkcontent.py

+ 1 - 3
magic_pdf/dict2md/ocr_mkcontent.py

@@ -153,7 +153,7 @@ def merge_para_with_text(para_block):
                     elif span_type == ContentType.InlineEquation:
                         para_text += f" {content} "
                 else:
-                    if span_type == ContentType.Text:
+                    if span_type in [ContentType.Text, ContentType.InlineEquation]:
                         # 如果是前一行带有-连字符,那么末尾不应该加空格
                         if __is_hyphen_at_line_end(content):
                             para_text += content[:-1]
@@ -161,8 +161,6 @@ def merge_para_with_text(para_block):
                             para_text += f"{content.strip()} "
                     elif span_type == ContentType.InterlineEquation:
                         para_text += content
-                    elif span_type == ContentType.InlineEquation:
-                        para_text += f"{content} "
             else:
                 continue