known_issues.rst 922 B

12345678910111213141516171819
  1. Known Issues
  2. ============
  3. - Reading order is based on the model’s sorting of text distribution in
  4. space, which may become disordered under extremely complex layouts.
  5. - Vertical text is not supported.
  6. - Tables of contents and lists are recognized through rules; a few
  7. uncommon list formats may not be identified.
  8. - Only one level of headings is supported; hierarchical heading levels
  9. are currently not supported.
  10. - Code blocks are not yet supported in the layout model.
  11. - Comic books, art books, elementary school textbooks, and exercise
  12. books are not well-parsed yet
  13. - Enabling OCR may produce better results in PDFs with a high density
  14. of formulas
  15. - If you are processing PDFs with a large number of formulas, it is
  16. strongly recommended to enable the OCR function. When using PyMuPDF
  17. to extract text, overlapping text lines can occur, leading to
  18. inaccurate formula insertion positions.